Watson Wasn't Perfect: IBM Explains the 'Jeopardy!' Errors
Those blips, obviously, were few and far between -- and they didn't slow down Watson's highly publicized victory this week against two human champions on the TV game show Jeopardy!. The flesh-and-blood competitors were no slouches, either. Brad Rutter had earned $3.26 million playing Jeopardy!, making him the largest dollar winner in the show's history. And Ken Jennings had the program's longest-running winning streak.
You Can't Blame "Human Error"
On Monday, Day 1 of the three-day contest, Watson tied Rutter at $5,000 each in winnings. Along the way, however, the supercomputer hit a few blips when it came to processing clues given by Jeopardy! host Alex Trebek. Knowing that writing off the mistakes to "human error" wouldn't cut it, Big Blue's (IBM) Watson team have explained the supercomputer's miscalculations in a variety of posts on the Internet.
Jennings was first to buzz, but gave he wrong answer of "What is the 1920s?". Watson was next to answer, but apparently wasn't listening. The mega-machine repeated Jennings' mistake, prompting a gentle scolding from Trebek. But Watson isn't listening to Trebek, either: The supercomputer has no ears, nor the ability for speech recognition.
Chris Welty works on Watson's algorithms team. According to an Ars Technica post, Welty and his crew thought it wasn't necessary for Watson to crunch through the other contestants' wrong answers.
An Arm and a Leg
Similarly, Watson may have benefited from Jennings' wrong answer in responding to the clue: "It was the anatomical oddity of U.S. Gymnast George Eyser, who won a gold medal on the parallel bars in 1904." Jennings answered Eyser was missing an arm -- and Watson then offered up, "What is a leg?"
Although Watson got the body part right, he was dinged for failing to note it the leg in question was "missing."
In a blog post, David Ferrucci, who heads up the Watson project, noted Watson likely didn't understand the word "oddity." According to Ferrucci: "The computer wouldn't know that a missing leg is odder than anything else."
But Watson's creators note that, over time and by playing more games and by gaining greater exposure to more material, the supercomputer could possibly gain a greater understanding of those concepts, since it's loaded with machine-learning technology. Watson's rivals, in mapping out their strategy to go head-to-head with it, were well aware he seemed to struggle with abstract concepts and short clues.
On Day 2, Watson missed one clue by a country mile -- better make that an entire country. During a Final Jeopardy! segment that included the "U.S. Cities" category, the clue was: "Its largest airport was named for a World War II hero; its second-largest, for a World War II battle."
Watson responded "What is Toronto???," while contestants Jennings and Rutter correctly answered Chicago -- for the city's O'Hare and Midway airports.
In a blog post, Ferrucci pointed to several issues that may have tripped-up Watson:
Learning the GameFirst, the category names on Jeopardy! are tricky. The answers often do not exactly fit the category. Watson, in his training phase, learned that categories only weakly suggest the kind of answer that is expected, and, therefore, the machine downgrades their significance. The way the language was parsed provided an advantage for the humans and a disadvantage for Watson, as well. "What US city" wasn't in the question. If it had been, Watson would have given US cities much more weight as it searched for the answer. Adding to the confusion for Watson, there are cities named Toronto in the United States and the Toronto in Canada has an American League baseball team. It probably picked up those facts from the written material it has digested. Also, the machine didn't find much evidence to connect either city's airport to World War II. (Chicago was a very close second on Watson's list of possible answers.) So this is just one of those situations that's a snap for a reasonably knowledgeable human but a true brain teaser for the machine.
Like any artful player, however, Watson developed a sense of when to hold, to fold or to play.
Watson knew the Toronto answer could be big-time bust, so it wagered a mere $947.
According to a blog post by Gerald Tesauro, an IBM researcher, Watson's wagering style largely hinges on two questions: "How likely am I to answer the Daily Double clue correctly?" and "How much will a given bet increase or decrease my winning changes when I get the Daily Double right or wrong?"
As any Jeopardy! fan knows, the Daily Double can make or break a winning streak. A couple of these are hidden on the game board, and players who land one can bet from $5 to their entire holdings on that single clue. A Daily Double not only has the potential for a contestant to double his money but it's also not subject to a rival jumping in with his own answer.
In making a wager, Watson first relies on mathematical models and algorithms, processing the data from its vast database, to determine the likelihood of a correct answer.
Answering the second question is far more involved. Watson uses a Game State Evaluator, a complex model that estimates its chances of winning based on such things as the competitors' scores, the number of remaining Daily Doubles and value of the clues remaining.
That technology also includes an in-category Daily Double confidence level, providing Watson with a view into its odds of winning a game based on a Daily Double bet. And its risk analytics software also weighs the likelihood of winning with a particular bet.
Because of Watson's betting strategy, it often ends up with nontraditional bets that forgo rounded values. Hence, the wonky $947 Toronto bet, versus a $900 or $1,000 bet.
Notes Tesauro: "Such values may make the arithmetic a little more challenging for the humans when computing their bets."
As if that's the only thing humans have to worry about when it comes to Watson. . .