Selasa, 15 Januari 2013

Why Write Your Own Book When An Algorithm Can Do It For You?

Phil Parker is unlike any writer you've ever met - or read for that matter. That's because he doesn't write most of his books. Instead, the trained economist uses sophisticated algorithms that can pen a whole book from start to finish in as little as a few minutes. The secret is sophisticated programming mimicking the thought process behind formulaic writing. It can take years to create these programs, but once completed, new books can be churned out in minutes.

This method has led to Parker's company - ICON Group International Inc - auto-writing more than a million titles, mostly nonfiction books on very specific subjects. But there's poetry, too - see an  (Parker says most poetry is governed by strict formulas.) He claims he's basically applying 19th Century Taylorism to the publishing industry, emulating the famous auto manufacturer's process.

Parker's work isn't publishing's first foray into auto-writing, but up until now, most, like companies like Chicago-based Narrative Science and North Carolina's StatSheet, focused on short, formulaic sports and crime writing for newspapers, not full-length books.

When ReadWrite spoke with Parker this week, he gushed about how much press he's gotten of late - and how often they'd gotten his story wrong. 

Writing By The Numbers

ReadWrite: Tell me about the algorithm you created to auto-write books. 

Phil Parker: The non-fiction algorithms and methodology are not original at all... The whole field is called econometrics. All the algorithms we did was mimic what economists have been doing for decades.

In the 1990s I was working on reports where you had to do a lot of economic analyses and I realized that most of what an economist does is itself extremely formulaic in nature. With the advent of larger hard disks, Windows, RAM, a lot of that process could be reverse engineered and basically characterized by algorithms and be used in an automated fashion. The methodologies are extremely old, just like the methodologies of writing haiku poetry are very old. An Elizabethan sonnet is 14 lines - that is a line of code if you think of it that way. The code is constrained. So all genres, no matter what the genres are, are a form of constrained writing.

ReadWrite: What kind of restraints?

Phil Parker: There are constraints to the length of the book based on page formats and font sizes and the expectations of readers. There's natural constraints that exist in all forms of writing. In the nonfiction area, the constraints are fairly understood by the people in that area. 

Small businesses doing import-export businesses, they do it for very narrowly defined products. They don't do it for general products. That's why for Amazon and elsewhere, all these titles we created, very arcane categories, and that's because that's what people actually do business in. Nobody does business in hardware parts, they do it in 6-inch copper screws. So for those businesses, to hire a consultant firm to say 'Hey, can you give me a worldwide estimate of copper screws,' the firm would go out and spend a month or two basically doing the job an economist and a couple of researchers do. Those people then pass off the editorial analysis to a group of people who do formatting and copy editing and graphic design, who then pass it off to another group of people who do metadata, covers, spines, all that. All we did is reverse engineer that. But the methodology to do that already existed before the books existed. 

ReadWrite: So it's not a new form of writing?

Phil Parker: I have not created any new way of writing. All I'm doing is writing computer programs that mimic the way people write. Going back to the Elizabethan sonnets, Shakespeare or one of his contemporaries created the 14-line iambic pentameter poem, where the rhyming pattern was 'a-b, a-b, c-d, c-d, e-f, e-f g-g.' G-g being a couplet at the end. By line 9 there has to be a turn in the poem, so there has to be a phrase like 'yet' or 'but.' The first line is typically a question, which acts as a title. All of them are 10 syllables in each line... they have to go in the rhythm of that pattern. If you do an analysis of sonnets, you'll realize that about 10% of sonnets violate those rules. But they do it only in a very particular way. Even that formulation of violation is itself constrained... Once you have all of those rules you then write algorithms that mimic those rules. It's a very different kind of philosophy from artificial intelligence.

All About The Algorithm

ReadWrite: Tell us about that algorithm.

Phil Parker: We created a system which we think mimics the human mind... The truth is, if you step back far enough, all of literature is highly formulaic, not just romance novels. Some of the genres are so forumalic that the publishers of those genres tell the potential writers how to write the books themselves.

ReadWrite: What do you mean by formulaic?

Phil Parker: A genre is defined by formula. What's interesting across genres, often you find the same formulas taking place in little twists, which throw them squarely in a different genre. But the twist is minute. A romance book can become a thriller by rearranging certain components of it. In essence, formulas of genres have patterns in them which overlap with each other. Think of a Venn diagram, and the intersection between them. The more genres intersect with each other, the more likely that recurring patterns can be observed. 

We started using this graph theoretic approach to write dictionary definitions. I have this thing called Webster's Online Dictionary... I turned my algorithms on generating dictionaries... using cluster analysis and graph theory combined. It algorithmically mimicked what a lexicographer should do if they had access to such a large data base. The process involved first creating the linguistic graph that defines language and all of the relationships between words and the phonetics behind the language.

ReadWrite: Does it really take 20 minutes to auto-write a book?

Phil Parker: It could take 2-3 years to set up the algorithms, but once you've got it, the software has now been fully coded. Once you decide to write one on [a particular] topic, it only takes 20-30 minutes... I think the slowest one might take an hour or two, the fastest 4 to 10 minutes. 

ReadWrite: How much does it cost to produce?

Phil Parker: The cost could be the equivalent of 2-3 man years of programmer time, and maybe an analyst or editors that might be required on that project... could be $200,000 to $500,000 to set up a genre. 

ReadWrite:  How many books have you written by hand versus with the algorithm? 

Phil Parker: The ones I wrote by hand were academic books - they were like MIT press - it wasn't using algorithms. So 6 of them. And more than 1,000,050 titles using automation. It's a moving target by the hour... we put a lot out of print because they're dated after 2 or 3 years because they're statistical analyses... You cull the catalogue so to speak.

Humans Vs. Machines

ReadWrite: What's the big difference between human writing and machine writing?

Phil Parker: There's the classic turing test about a conversation with a robot: Can you tell the difference between a robot and a real human who's conversing with you? Is there something different about these topics? I don't think anybody would look at our crossword puzzle books and say, 'Oh my gosh, a computer wrote this,' because most crossword puzzles are so formulaic that you would expect it to be formulaic... If people find it useful to be in a formulaic format, so much the better. The goal isn't to sound better than an author. The goal is to deliver something useful to people. That's the end of it, no more. Otherwise, why bother doing it? 

ReadWrite: So are human authors replaceable?

Phil Parker: Bloggers that we talked about earlier who read 3 different articles, read a Wikipedia page... those people can be replaced with computer algorithms, because they're doing formulaic work. What you're doing right now is not formulaic. Because you're probing, you're going to the depths of it.

There's been in the last 2 weeks about 10 articles written about what I've done and none of them talked to me about it. They're all copy and pasting from each other. I think it's very a interesting observation that they're using a formulaic method to deliver content and put their name on a byline, when in fact they've done a formulaic cut-and-paste. I would call those kinds of articles low on the creativity front. 

 

 Lead image courtesy of Shutterstock.



0 komentar:

Posting Komentar