The Gadget
The gadget was about the same size as tourist’s snow-globe, with a small LED display capable of displaying about 10-15 characters, and it plays 20 Questions. It tries to guess the things that you are thinking of, by asking a series of questions, which you can answer, Yes, No, Sometimes and Unknown.
What a weird concept – take a dull game designed to kill time on long trips, and take away the human interaction and ingenuity, and the result is this toy.
It is called 20Q, and it is made by Radica Games. [I am going to jump ahead to point out that the technology was licensed from the developer of a free web-version of the game, which is more advanced, but it gives you a good idea of what I was up against.]
Beating the Puzzle, on the Surface
The family next to me on the plane were totally rapt by this game, explaining that despite the fact that they had been playing it frequently, it had never lost!
Perhaps that is what ruined the enjoyment of the game for me – it was over-hyped. I immediately took on the challenge to defeat it. It was apparently an American game, so what item would the American team have missed? My first item was a Marsupial Mouse.
Unsurprisingly, the game failed to pick the right animal. However, watching it fail taught me some more about the way the game worked.
[Note: In my second game, I chose a more conventional item – “ear-phones†and it guessed “head-phonesâ€, which I counted as a successful match.]
Understanding the Game Play
The game was very slow. Initially, I thought this was an attempt to build suspense – certainly the game did display random pre-set texts about how tricky/easy this one was, or how it had to think deeply. However, I am wondering now if it was to cover for the limited processing power of a small toy.
It would always make a guess after the twentieth question. If it got it wrong, it would bend the rules, and ask five more questions, before giving up. (The web-site version is even more lenient on itself.)
The questions were not always “orthogonalâ€. For example, it asked me early on if my mouse was a vegetarian. In one of the five bonus questions it asked me if it was an herbivore.
The question about whether it weighed more than a pound of butter made me laugh too. “No,†I wanted say, “It weighed less than a pound of butter, but more than a pound of feathers.â€
Despite this, the algorithm was powerful enough to give the impression of intelligence – at least amongst my traveling companions. “Oh, it’s worked it out now!†said the young man only halfway through a game, “It’s just toying with me.â€
The web-site version of the game has room for more buttons for answers, including “Probably†and “Doubtfulâ€. I tried answering “Unknown†for each question, but it did not advance the question number, and just came up with different general questions.
Beating the Hidden Puzzle
So the real reason this incident has stayed with me is the question of how does it do it? If I had a team and wanted to build a toy like this, how would I go about it? I considered this puzzle without looking up the answers online – that would be cheating!
I divided the problem into two parts:
- The artificial intelligence that takes the previous answers and determines the next question, and
- The number of database of items and facts that are required.
AI Algorithm
I figured I knew the basics of the algorithm – I played with the source code of a similar game back in the old days, before Game Boys. The game would walk through a binary tree of Yes/No questions. Each leaf represented an item. If the item was not correct, you were asked to enter what your item was, and a question to distinguish it from the guess the computer made, and the binary tree was extended for the next player. Due to hardware limitations, the game I played with only had a maximum of 256 items (i.e. 8 questions).
That is a very simple system. The web-based version (and by inference, the toy) is not using a system that simple. For example, if you repeat the same answers the second time around, you will be prompted with different questions. Similarly, it handles “Unknown†answers fairly gracefully.
So, while I don’t fully understand the algorithm, it is within my mental grasp that after getting the team to work on the problem, we could have a good-enough algorithm after a few months of thinking and researching.
The Database
To me, the hard part is coming up with zillions of objects that people tend to guess and enough arbitrary, common but distinguishing, facts to generate the tree in the first place.
The sheer number of items is, in theory, approximately 325, or 8.47 × 1011 (eight hundred and forty seven billion). In reality that figure is unfeasibly large. Fortunately, we can rely on the lack of imagination of humans to ensure that they tend to pick the same sorts of objects as each other. How many do you really need? We are aiming to get the kind of figures that the 20Q web-site claims (98% success after 25 questions).
It’s a good question, I don’t know the answer. My estimates tend to drop the more I think about it. My initial guess was about five million, but now I think one-quarter million would probably do. It depends exactly how unimaginative people are.
So where do we get one-quarter million items and distinguishing facts about them?
My first idea was “textbooks, and lots of them!†– especially for the Animal and Vegetable questions, biological taxonomy would cover this quite well – except your average member of the public may be confused by the questions about whether jellyfish are bilaterally symmetric. The idea must be to ask questions that people can easily answer.
My next idea was the Mongolian Hordes approach. Persuade a large number of people to play a prototype of the game, and to have them submit their ideas, until I had enough to cover 98% of guesses. That sounded expensive and time-consuming.
Then I remembered the CYC project – an attempt to create a giant database full of every-day facts about the world that any software will need to know in order to exhibit human-like intelligence. I looked it up and the OpenCyc project is still more about the way of storing general knowledge than the knowledge itself. There appear to be only 6000 “objects†in the basic package..
The mystery remains.
Research leads to results – and spoilers
So I looked it up on the web to find the solution.
The algorithm for selecting elements is based on a neural network; the tree that it walks is not a pre-determined one, but generated on the fly. I didn’t expect that!
The facts have been established via the Mongolian Hordes approach.
They have a different web-site with a different version of the game. This version has more options for vague answers (Unknown, Irrelevant, Sometimes, Maybe, Probably, Doubtful, Usually, Depends, Rarely, Partly). Also, if it fails to guess an object, this version will prompt you to enter you item- submissions are then vetted by teams of moderators. I assume the moderators, or similarly trusted users, also come up with additional questions to add to the set.
Sometimes, it will ask a question where it is not sure of the answer, and use your answer to that question to update its database when it finds out the real answer.
The inventor, Robin Burgener has been running the algorithm since 1988, so it has been improving for a while, and the facts have been accreting.
He’s licensed the software to the manufacturer of the 20Q game, and also to help advertise fast-food/movie tie-in.
Conclusion
I don’t particularly find the game that interesting, although there are many reports of people finding it quite addictive. You can decide for yourself.
Nonetheless, the whole idea of the game, and how it was produced, was far more interesting to me – and surprising too.
Comment by Casey on August 22, 2005
20q is a web classic. I think you sell its approach a little short – I refer to the web version, not the cutdown one that is in the electronic toy.
The answer to building a knowledge base is to let it build itself – 20q built up its answer base through submissions – people who “beat” the game then tell it the answer, and it instantly has twenty pieces of information about it.
Naively, the ideal questioning tactic is to eliminate half the answers each time (a binary search, as you well know). However, as you’ve observed, some answers are more common than others, and so the correct questioning tactic becomes trickier. The aim should (I think) be to ask the question that divides the remaining probability distribution evenly (which resolves to the same case when all answers are equally likely).
In practise, it gets even tricker still. 20q’s setup is built to maximise the information it extracts from each session. If it guessed at each step (as well as asking another question), people would stop as soon as it was correct. Instead, it guarantees that it will get 18 or so pieces of information (true or not) about each object. 20q has an ulterior motive: it wants to increase its practical knowledge, and the best way to do so is to ask questions for which it is not sure of the answer. If it always asked “is it bigger than a breadbasket” (however big that is), it will end up with more information about that question, but less about another question it could have asked, eg. “is it smaller than a suitcase”. Given that it thinks it can get the answer with some room to spare, it is better for 20q to ask a question for which the answer will be more informative. This complicates matters no end.
I have always been impressed by 20q as a system that truly, genuinely works. The real reason it works is that it has harvested a massive amount of information from people, who gave it freely and without realising. This is how data collection should be. I have run a couple of web games that also do game-based data collection, and it really works.
Comment by Alan Green on August 22, 2005
We have one of these gadgets, though I hadn’t heard of the web version. The thing that impresses me most about the toy is that it copes well with the occasional wrong answer.
My guess is that the gadget has a only a few thousand answers embeded in it. The typical, everyday speaking vocabulary is much less than 10k words. Throw away verbs, adjectives and adverbs, allow for “close matches” to be combined (e.g. I couldn’t get the gadget to distinguish between DVD and CD) and, Harry Butler fans excepted, a 98% hit rate is well within its reach.
Comment by Julian on August 24, 2005
Casey,
Did I sell the approach short? That wasn’t my intention, and I apologise if I did. I don’t apologise for impugning the irony-awareness and puzzle-solving skills of my neighbours. I also confess to questioning the enjoyability of continued playing the 20Q game.
However, I respect the approach taken Burgener – especially as I had wrongly written it off as infeasible! (In my own defence, the idea of working on the problem for 17 years seems to fit under my definition of “expensive and time-consuming”.)
Alan,
I can’t argue. I admit my ability to estimate in this area is poor.
Comment by Jeff Atwood on August 26, 2005
To keep this marginally on-topic, my mom (of all people) recently bought me one of these devices. I found it interesting but not exactly fun, per se. But opinions vary.
Do I detect anti-American sentiment? Having recently moved to California from North Carolina (eastcoast -> westcoast), I am also wondering what a Californian accent sounds like..
At any rate, I prescribe one viewings of “Team America: World Police” at your earliest possible convenience.
Comment by Julian on August 27, 2005
Jeff,
That would be my opinion in a nutshell.
Well, Jeff, this comment has had me thinking (and being introspective) for the last half-day. I think I need to respond to this more fully once I have thought it through more, and after I have taken the medication that you have prescribed.
I am not ashamed to confess that I used some poetic license here. I originally wrote “American accent” but I changed it to avoid the repetition of the word “American” which – at the time I was editing it – appeared three times in the same sentence. I knew he was Californian from the conversation, not directly from the accent.
Nonetheless, I was surprised at the idea that there was no such thing as a Californian accent, so I did a small amount of research. I may not really have the skills to pick it, but California English really is a dialect. (A “dialect” is close enough to an “accent” for us non-linguists.)
Comment by Julian on September 5, 2005
Jeff questioned my attitude about part of this article, and after thinking about it for a while, I decided he was right. (I did watch Team America: World Police too, but I don’t think it affected my decision.) Not only was it a little inappropriate, but I realised it didn’t really present my views accurately.
Sorry.
That then raised the question of what to do about it. I considered various forms of penance, including a blogging on Australia’s special attitude to English, US and New Zealand citizens, but I realised that wasn’t going to be a penance; it was going to be a lame excuse! So, I took another channel open to me – editing the blog article to remove the offending text.
I removed the opening section that introduced the characters and the situation we were in. This includes the mention of the Californian accent (debated in the comments above), and the reference to irony.
That itself took some introspection – is taking a revisionist view of blog history morally permissible? Can I simply delete anything I said just because it no longer suits me? In this case, I decided, the answer was ‘yes’!
Comment by Redsave on February 24, 2009
I’ve played this extensively at Christmas during a boring family meetup, the game worked well but i found due to being forced into only giving yes and no answers meant that some questions were almost impossible to answer correctly, i think the game is more aimed towards children but can provide light entertainment for adults too.
Comment by Chantal on June 10, 2009
20Q is not an american game. It’s based in Ottawa, Canada….