Wednesday, April 30, 2014

Computer Writer Vs. Computer Grader

Les Perelman is a hero of mine. The former director of undergraduate writing at MIT has been one of the smartest, sanest voices in the seemingly-endless debate about the use of computers to assess student writing. And now he has a new tool.

Babel (the Basic Automatic B.S. Essay Language Generator) was created by Perelman with a team of students from MIT and Harvard, and it's pretty awesome as laid out in a recent article by Steve Kolowich for The Chronicle of Higher Education.

Given the keyword "privacy," Babel generated a full essay from scratch. More accurately, it generated "a string of bloated sentences" that were grammatically and structurally correct. Here's a sample:

Privateness has not been and undoubtedly never will be lauded, precarious, and decent. Humankind will always subjugate privateness.

Run through MY Assess! (one of the many online writing instruction products out there), Babel's privacy essay scored a 5.4 out of 6, including strong marks for "focus and meaning" and "language use and style."

Perelman has demonstrated repeatedly over the past decade that "writing" means something completely different to designers of essay-grading software and, well, human beings. When Mark Shermis and Ben Hammer produced a study in 2012 claiming that there was no real difference over 22,000 essays between human grading and computer grading, Perelman dismembered the study with both academic rigor and human-style brio. The whole take-down is worth reading, but here's one pull quote that underlines how Shermis and Hammer fail to even define what they mean by "writing."

One major problem with the study is the lack of any explicit construct of writing. Without such a construct, it is, of course, impossible to judge the validity of any measurement. Writing is foremost a rhetorical act, the transfer of information, feelings, and opinions from one mind to another mind. The exact nature of the writing construct is much too complex to outline here; suffice it to say that it differs fundamentally from the Shermis and Hammer study in that the construct of writing cannot be judged like the answer to a math problem or GPS directions. The essence of writing, like all human communication, is not that it is true or false, correct or incorrect, but that it is an action, that it does something in the world.

Computer-graded writing is the ultimate exercise in deciding that the things that matter are the things that can be measured. And while measuring the quality of human communication might not be impossible, it comes pretty damn close.

There are things that computers (nor minimum-wage human temps with rubrics in hand) cannot measure. Does it make sense? Is the information contained in it correct? Does it show some personality? Is it any good? So computer programs measure what can be measured. Are these sentences? Are there a lot of them? Do they have different lengths? Do they include big words? Do they mimic the language of the prompt?

And as Perelman and Babel show, if it's so simple a computer can score it, it's also simple enough for a computer to do it. Babel's "writing" is what you get when you reduce writing to a simple mechanical act. Babel's "writing" is what you get when you remove everything that makes "writing" writing. It's not just that the emperor has no clothes; it's that he's not even an emperor at all.

In the comments section of the Chronicles article, you can find people still willing to stick up for the computer grader with what have become familiar refrains.

"So what if the system can be gamed. A student who could do that kind of fakery would be showing mastery of writing skills." Well, no. That student might be showing mastery of some sort of skill, but it wouldn't be writing. And no mastery of anything is really required-- at my high school, we achieved near-100% proficiency on the state writing test by teaching our students to
           1) Fill up the page
           2) Write neatly
           3) Indent clearly
           4) Repeat the prompt
           5) Use big words, even if you don't know what they mean ("plethora" was a fave of ours)

Software can be useful. I teach my students to do some fairly mechanical analyses of their work (find all the forms of "be," check to see what the first four words of each sentence are structurally, count the simple sentences), but these are only a useful tool, not the most useful tool or even the only tool. I'm not anti-software, but there are limits. Most writing problems are really thinking problems (but that's another column). 

Babel demonstrates, once again, that computer grading of essays completely divorces the process from actual writing. HALO may be very exciting, but getting the high score with my squad does not mean I'm ready to be a Marine Lieutenant.


No comments:

Post a Comment