Last week, Mike Pope wrote about card-shuffling.
By his own admission, he didn’t know the best way, and he described a number of sub-optimal solutions – generally O(n2) – but some had an unbounded worst case, and one displayed biases.
Another commenter pointed to John Viega‘s solution, which assigned a random number to each card, sorted the result, and then reshuffled if no two random numbers were the same. It was on average about O(n.log n) but it had an unbounded worst case.
So along I came, and ever so helpfully added a comment to the article with an algorithm I stole off a Commodore-64 program and have been using (very occasionally) for over 20 years. It is short, bounded, takes O(n), it has a low-constant time and a small memory footprint
Pope came back to say he’s tried it, but he was not sure of the randomness. I wasn’t sure how to respond to that – it was obvious that it is completely random to me. I made a note to go back to leave another comment, but before I got a chance…
… along came Jeff Atwood at Coding Horror, and he waded in with a post. (Actually, Eric Lippert argued against my algorithm first in a comment on the article, but I saw Jeff Atwood’s post before Eric Lippert’s comment.) Atwood brought up an algorithm very similar to the one that I had proposed, and then tried to shoot it down on two grounds.
One, that it was too complicated. I didn’t feel that it was complicated at all. Then again, I have been using it for a while and it seems natural. Anyway, quicksort is more complicated than bubblesort, but you implement it once, and it’s done, right?
The other was that its random number generator wasn’t secure enough. (There’s a very slight strawman here – my original didn’t mention the seeding of the random number generator, but (a) Atwood wasn’t trying to directly attack my solution, he was trying to attack a naive solution that he himself would have tried, and (b) Atwood’s solution wasn’t much different to the one I would have chosen, had I bothered to mention it.)
My response to that was to grumble to myself about security experts who think every single project has security implications, resulting in terrible scope creep. Yes, if you are implementing an online casino, the security of randomisation is a real problem. However, for a huge majority of implementations, it simply isn’t an issue. For a single player implementation of Klondike it isn’t important. My main use of card-shuffling is for Monte Carlo simulations. The idea that someone could crack the code and start predicting what was going to come next is irrelevant in these situations.
If you have a choice between implementing securely or insecurely for the same price, obviously you should take the secure method – hey, you never know when your code will be re-used – but how much does the secure method cost?
Atwood recommends generating a GUID per card, and sorting by that! Oh dear! Not only is it O(n.log n) but it has a huge constant time. My Monte Carlo simulations are going to be dominated by the card-shuffling, especially when testing a game like Blackjack with simplistic play and 416 card decks. I’ll stick with my algorithm, thanks!
I made a note to go back to leave another comment, but before I got a chance…
… Atwood posted another follow-up in which he called the algorithm I used “naïve”. He then quietly demonstrated that it wasn’t truly random at all.
You could have knocked me down with a feather. I was completely wrong. Perhaps “naïve” is the word!
My algorithm had biases – exactly the sort of biases that you don’t want in a Monte Carlo simulation.
In summary: I’ve been owned by Jeff Atwood. Eric Lippert owns a part-share of me, too.
Comment by Alan Green on December 9, 2007
Back in the days of Windows ’95, I wrote a program that shuffled 4 million, uniquely-numbered cards. It produced a file containing the card number, a prize description (most of which were, “You can enter the 2nd chance draw!”) and a verification code. The file was sent to a printing firm, which printed up the 4 million cards.
I got the shuffling algorithm badly wrong twice before finally devising a variant of Knuth’s algorithm. At the time I was proud that it ran in O(number of cards) time and O(number of types of card) space while giving a pragmatically – if not theoretically – random distribution. Looking back, I’m just relieved that there wasn’t a lotteries board asking to see my random number generator 🙂
Comment by Aristotle Pagaltzis on December 10, 2007
I could’ve told you, had I known you were involved in the first place.
I read a bunch about the derivation of the Fisher-Yates shuffle a few years back, because it’s given as the answer to “how do I shuffle an array?†in the Perl documentation’s FAQs, and bias in shuffles came up in a number of threads on Perl Monks. One of them, that I remember well, discussed doing a quicksort where the comparison function declares the numbers equal or unequal half the time, respectively, based on the output of a random number generator. (
@shuffled = sort { rand < .5 } @orig
) – which seems to work, but suffers from bias. It took me a while to understand why. That was my first contact with the subject matter.So when I read Jeff’s article, I looked at that algorithm, scratched my head, went “err hmm… that looks biasedâ€, and then put it out of my mind and went about my ways. I was not at all surprised to read Jeff’s tail-between-legs follow-up about that algorithm being broken and how it took him a while to understand why.
Anything that needs to be analysed stochastically is notoriously insidious to get right – particularly if it involves randomness. And of course we all know that “Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin.†—John von Neumann
Comment by Alastair on December 10, 2007
This would be a good interview question.
The correct answer of course is
std::random_shuffle
.I had a brief play with it just now. Here it is shuffling a three card deck 600000 times:
abc: 99973
acb: 100118
bac: 99733
bca: 99955
cab: 100043
cba: 100178
Comment by Aristotle Pagaltzis on December 10, 2007
Heh. That answer would be correct, except that you started with the premise of C++… :-)
The standard libraries of most languages probably have an equivalent. The Perl FAQ entry starts out “if your Perl isn’t ancient, use
List::Util::shuffle
.†Atwood is a .NET guy; that huge framework more likely than not also has a shuffling function somewhere. Of course programmers are prone to reinvent seemingly-trivial functionality before they think of thumbing through their tool’s docs.Comment by Alastair on December 10, 2007
Guilty as charged…
Actually whilst poking around with this I found that the Visual C++ implementation of the
std::random_shuffle
algorithm correctly handles the fact thatrand()
has only a certain degree of precision, whereas the gcc implementation does not. This may introduce a subtle bias in the gcc version if the number of items to be shuffled approachesRANDOM_MAX
.Comment by Sunny Kalsi on December 11, 2007
I would submit that Jeff Atwood was told by someone else about his error, and in fact was owned himself. Jeff blogged on it, but didn’t actually own anyone. There’s some code ninja who has actually owned you by proxy…
Likely this person.
I actually agreed when he said when it was complicated, because I couldn’t imagine myself doing it. I mean physically, with a set of cards. If I was being really anal about shuffling cards, I’d do the following. In Java:
public static List shuffle(List cards)
{
final List out = new LinkedList();
for (final Card c : cards)
{
int index = (int) (Math.random() * (out.size() + 1));
out.add(index, c);
}
return out;
}
The test I wrote says it’s not “naive”. Now all you need to do is choose a nice list which does O(1) inserts + O(1) lookups (clearly, an exercise for the reader).
PS: Damn I can’t do code listing…
Comment by Andrew Certain on May 13, 2008
This limitation is not correct, and the reality is much worse than your statement. The problem happens if either the state space of the RNG is smaller than the possible states from which you want to choose (or the state space of the initializer is smaller than the output state space). If you want to shuffle 52 cards, you have 52! output states, so you need at least 226 bits of state in the RNG (and it has to be initialized somehow with at least that many bits).
As cited above, see the Wikipedia entry for Fisher-Yates shuffle.
Andrew