In my circulation around the internet, I keep hearing about this book called the Bestseller Code. It’s not out yet, but a chapter of it is available for free on Amazon. Ever curious, I grabbed it and read it.
Here’s the summary:
This sneak peek teaser – featuring literary giants John Grisham and Danielle Steele – from Chapter 2 of The Bestseller Code, a groundbreaking book about what a computer algorithm can teach us about blockbuster books, stories, and reading, reveals the importance of topic and theme in bestselling fiction according to percentages assigned by the what the authors refer to as the “bestseller-ometer.”
Although 55,000 novels are published every year, only about 200 hit the lists, a commercial success rate of less than half a percent. When the computer was asked to “blindly” select the most likely bestsellers out 5000 books published over the past thirty years based only on theme, it discovered two possible candidates: The Accident by Danielle Steel and The Associate by John Grisham.
The computer recognized quantifiable patterns in their seemingly opposite, but undeniably successful writing careers with legal thrillers and romance. In Chapter 2, Archer and Jockers analyze this data and divulge the most and least likely to best sell topics and themes in fiction with a human discussion of the “why” behind these results.
The Bestseller Code is big-idea book about the relationship between creativity and technology. At heart it is a celebration of books for readers and writers—a compelling investigation into how successful writing works.
Intriguing idea, right? How can a computer algorithm pick out bestsellers?
Well, when you dig into it, it’s really stuff that readers know intuitively, but never really articulated. Here’s some excerpts:
If we compute an average proportion for each topic in all the books by each of these authors, it certainly seems that Steel and Grisham learned something from the old maxim “write what you know.” The author who dreamed of baseball but then became an attorney has “Lawyers and the Law” as his most prevalent theme, followed by “American Team Sports.” Steel, who has been through five marriages, raised nine children and lost one, writes mostly about “Domestic Life,” “Love,” and “Maternal Roles.”
Roughly a third of all the paragraphs Grisham has ever written deal directly with the legal system, and similarly Steel has given almost an exact mathematical third of her pages over the years to the theme of domestic life, or even more specifically “time spent inside the home.”
Grisham and Steel each have only one signature theme, not two, that takes up a whole third (on average) of each of their novels. This likely helps with their branding. All the many other topics each writer employs are used in tiny percentages. Grisham’s second-most-used topic across his canon is American sports, but it is the subject of only 4 percent of his pages, and this average is no doubt as large as it is because it gets a big bump from his non-legal thriller Calico Joe— a book that is entirely set in a world of baseball. Many of Grisham’s other secondary themes are no big surprise: money (3 percent), cops (2 percent), and government intelligence (2 percent).
The less immediately obvious topic, at almost 4 percent of all of Grisham’s pages, is a topic we call “everyday moments.” The name is deliberately vague and undramatic. The scenes in which this topic shows up prominently may involve two people chatting, or sitting on a sofa watching TV, or walking down the street. Not much is going on but day-to-day living. Its presence as number three in Grisham, after law and sports, is important if only to indicate a writer who is aware of pace. Everyday interactions between characters are there in order to vary the pace of the drama and avoid melodrama. It is the kind of topic no one would likely think they read for, but if these scenes that offer breath and reflection are totally absent, a reader is almost guaranteed to complain.
There are other minor topics in Grisham, though, ones that we would have been less likely to guess immediately. These topics, with similar proportions to cops and courts, deal with people in their domestic environments (a top topic for Steel), kids enjoying summer at home (with words like “porch” and “bike”), scenes about relationships (also very important in Steel), and family.
Steel’s top few themes appear to put her characters and those of Grisham in very different worlds. After time spent in the home— a topic whose specific nouns suggest the home of a typical nuclear family— she gives 5 percent more of her storytelling to a similar theme we called “family time.” The nouns in this word group suggest a family at home, engaged in everyday activities: dinner, conversation, rest, love, weekends. So far it is all quite low drama. Her third most used topic, though, deals with hospitals and medical care. This topic is made up of words like “nurses,” “doctors,” “ambulance,” “emergency,” and “accident.” It suggests not the long-term stay of a patient with a chronic disease, but instead the sudden and unexpected event that threatens the domestic contentment of Steel’s primary themes.
There’s a lot more in this vein–analyzing the topics in the proportions. It boils down to “people like reading about people interacting in casual, friendly, intimate ways.” Oh, but sex doesn’t sell.
If we take a cross section of almost five thousand novels— five hundred of which are bestsellers and the rest are not— and measure the presence of five hundred different themes across all of them, then the proportion of the whole taken up by sex is just about a thousandth of a percent. If you then measure the content of bestselling novels (and we will explain how this is done in just a moment), this fraction for sex goes down to 0.0009 percent.
It’s hard to believe. Who would have thought that sex does not sell? We tell people and still they do not believe us. But the truth is this: sex, or perhaps more precisely erotica, sells, and it sells in notable quantities, but only within a niche market. Titles within that genre rarely break out enough to win the attention of the mainstream reading market that creates bestsellers.
We know what you are thinking: “What about Fifty Shades of Grey?” Well, that novel (or those novels if you count the whole series) is one quite rare example of an erotic story that hit the lists. … Contrary to what you might expect given the prominence of sex in TV, movies, and the media, the U.S. reading public public of the past thirty years has demonstrated a preference for other topics.
The algorithm actually came up with a list of things that didn’t sell–at least, not on that snapshot of the New York Times Bestseller list. This is where all my spec fic friends are going to cry foul.
Two notable sets of under-performing topics are all things fantastical and otherworldly. Made-up languages, fantasy creatures, settings that don’t exist, space battles, and starships are all statistically far less likely to succeed on a mass scale than the topics of realism in today’s market.
Still, in the many topics that suggest a realistic world, there are some that are winners and others that are losers. Among the good, the popular, and (for writers) the go-for-its: marriage, death, taxes (yes, really). Also technologies— preferably modern and vaguely threatening technologies— funerals, guns, doctors, work, schools, presidents, newspapers, kids, moms, and the media.
By contrast, among the bad and unpopular, we already have sex, drugs, and rock and roll. To that add seduction, making love, the body described in any terms other than in pain or at a crime scene. (These latter two bodily experiences, readers seem to quite enjoy.) No also to cigarettes and alcohol, the gods, big emotions like passionate love and desperate grief, revolutions, wheeling and dealing, existential or philosophical sojourns, dinner parties, playing cards, very dressed up women, and dancing. (Sorry.) Firearms and the FBI beat fun and frivolity by a considerable percentage. The reading public prefers to see the stock market described more so than the human face. It likes a laboratory over a church, spirituality over religion, and college more than partying. And, when it comes to that one, big, perennially important question, the readers are clear in their preference for dogs and not cats.
This is where I start thinking about the data we’ve been presented. Of course, this is all based on one chapter of a very deep book, and I’m no statistician. But I am a reader, and I have a few theories about why these books sell.
First off, for the lack of speculative fiction in the algorithm–this was based off a snapshot of the 2014 NYT bestseller list. This was, I believe, right after the NYT changed its rules to keep indie published books off the list. (Otherwise it would have been pretty much dominated by picture books.)
The indie market has been killing it in speculative fiction. I mean, the Martian was indie–Andy Weir wrote it on his blog and dumped it to Amazon for a buck afterward. Traditional publishers have declared Urban Fantasy a dead genre. Meanwhile, on Amazon, UF is one of the big hot genres. Watch out, Jim Butcher, here comes Domino Finn and a bunch of others, out to steal your crown.
Science fiction, especially space opera, its going bonkers in the indie realm. So is epic fantasy–dragons, wizards, magic, all that jazz. Over on the kboards forum, writers of speculative regularly report being able to live off their earnings in those genres.
Now comes the speculation. This study found a few big things.
An author spends 1/3rd of the book solidly focused on genre tropes. If it’s a Grisham, people want law shenanigans. If it’s romance, they want relationships. If it’s fantasy, they want the fantastic. If it’s space opera, they want space ships and aliens. If it’s Harry Potter, they want Hogwarts.
This is all fine and dandy. But what separates the winners from the rest of the pack is that “human interactions” thing. We don’t read Harry Potter for the epic battles against the forces of Voldemort–we want to hang out with the Weasleys. “We’re not dumb. We know our names are Gred and Forge.”
In the Expanse trilogy by Corey, the heroes spend a LOT of time hanging out in the canteen of the ship, drinking bad space-coffee and debating what to do. There’s a ton of human interaction along the way.
After Harry Potter came out, I read a lot of the copycats that launched around the same time. They were all big on the action and weak on the heartwarming, cozy human interaction moments. They lacked staying power as a result. Out of the whole pack, I think only Percy Jackson managed to rise to popularity.
The Mitford books by Jan Karon were big on human interaction. Each book is pretty much “Father Tim wanders around a little town and talks to people”. There will always be a mystery to solve or an over-arching conflict to face, but at its heart, it’s just a cozy story. I think that’s why it sold like crazy.
Human interactions, marriage, death, taxes, moms, kids, and all the rest of things that feature in bestsellers–those are all what we call high concept. That is, something that everybody can relate to. We all have families. We all have laundry and taxes and death in the family.
So, basically, if you want to write a book that people want to read, you have to write about people dealing with common topics. But the fun of it is setting it in different genres. (In the second Expanse book, one of the main characters is trying to find his kidnapped daughter. So he crowdfunds his search. The resulting donations and trolling he gets ring absolutely true, whether here on Earth or roaming the moons of Saturn.)
As a reader, I know that I love the quiet moments where the characters spend time with other characters. Seems that I’m not the only one.