Too-Beautiful Data by William Poundstone

Those interested in the "hot hand" and sports streakiness should check out NBAsavant.com. It's an easy-to-use, addictive data visualization site that generates charts and heat maps of NBA players' scoring.

Data visualization, an indispensable tool of science and business, is a double-edged sword. The human eye (and brain) are good at spotting patterns and trends in noisy data. They are often better at this than algorithms are. That's the premise of CAPTCHAs; it's why people keep seeing "alien artifacts" in rover images of Mars. The problem is, the human perceptual system is too good at pattern-finding. Sometimes perceived patterns are only mirages. (Below center right, a "perfectly round sphere" recently imaged on Mars.)

This phenomenon is good for bookies. When sports fans think they can predict winners with a little data visualization (but really can't) that boosts business and profits. But in science it can lead to bad conclusions, and in business to bad decisions. Maybe data visualization tools ought to come with a warning like those on rear view mirrors: PATTERNS AND TRENDS MAY BE LESS SIGNIFICANT THAN THEY APPEAR.


The Cat That Beat the Stock Market by William Poundstone

Stock-picking cat Orlando

Stock-picking cat Orlando

We've all heard the mild joke that a monkey, throwing darts at the financial pages, can pick stocks as well as a professional. In no small part it was this mental image that motivated the index fund industry. Lately a new claim is current: A cat's stock picks can beat the S&P 500 index.

There are have been a couple of stock-picking cats. Last year the Observer invited three financial pros to compete against a ginger cat named Orlando and a group of British students. The cat's random picks (made by having the cat throw a toy mouse at a marked grid of stocks) outperformed the portfolios of the pros and the kids. 

Bob, another stock-picking cat, watching a human friend on TV

Bob, another stock-picking cat, watching a human friend on TV

In the U.S., the best-known stock-picking cat is the recently deceased Bob, owned by PIMCO bond guru William H. Gross. Wrote Gross: "I often asked her [Bob was female] about her recommendations for pet food stocks, and she frequently responded—one meow for 'no,' two meows for a 'you bet.'"

It will be apparent that cats are replacing monkeys as the favored metaphor for "random" stock picking. (Cat people are free to find this demeaning.) In any case, hedge fund manager David Harding, of Winton Capital, recently weighed in on the matter on CNBC. He didn't mention cats, but he described a system of picking 50 stock "at random" and weighting them equally. "We tested the idea and [it] immediately did better than the S&P 500."

Harding says he's raised $1 billion to invest in this random-50-stocks scheme, and it's been outperforming the market.

Is it possible that random picks (a "cat") can consistently outperform a broad market index like the S&P 500? It is. But let me first point out that that Observer contest doesn't mean much. It tracked the three portfolios over a year. Obviously there's a strong element of luck in that time frame. Suppose you and I competed at the roulette table. After a night of gambling one of us will do better than the other, but it would be wrong to conclude that the winner has superior skill. Roulette is dumb luck, and in the short term, the stock market is very close to that. 

Note that the Observer folks stacked the deck in favor of getting a click-friendly "man bites dog" headline. There was a two-third chance that either the cat or the school kids would beat the pros.

The S&P index tracks the 500 largest American stocks. Its purpose is to gauge the market performance of the companies most popular with investors and most important to the American economy. The S&P index was never intended to set a cap on what returns investors may expect in a close-to-efficient market. 

The S&P index is market weighted. It does not, in other words, track the worth of a portfolio invested equally in all 500 included stocks. Instead it assume the portfolio is weighted according to market capitalization. Apple, for instance, has recently been over 4 percent of the index (not 1/500 or 0.2 percent). 

This means that every S&P index fund is strongly overweighted in Apple (and other big, popular companies like Exxon Mobil, Microsoft, and IBM). What's wrong with that? As Harding explains: 

"If you have the same expected returns from assets you should put the same weights on them to optimize the portfolio. So if you choose stocks at random and combine them, you will always beat S&P 500, or in 99.99 percent of cases."


Anyone who truly believes that all 500 S&P companies have equally good expected returns—as an efficient market theory diehard might—would want to invest equally in all. You should put 1/500 of your portfolio in each. By overweighting a few popular companies, you take on extra risk without getting anything in return.

Now maybe you don't believe the market is all efficient, all the time. I don't, either. Unfortunately, there is little reason to believe that popular stocks merit their popularity. The evidence is that big and popular companies do worse than small and obscure ones.

The most popular stock of the moment is likely to be overbid. It won't always be most popular. Apple is such a market darling right now that it's hard to believe that posterity will look back and marvel at how undervalued it was in 2014. That means Apple's long-term return expectations are likely to be less than average for S&P 500 stocks.

For that reason, a balanced portfolio of all 500 S&P stocks, each comprising 1/500 of total assets, might be expected to have slightly better return and slightly lower volatility than the official S&P 500 index or funds tracking it.

You can do better than that. It is well established that stocks of small companies have outperformed those of large companies over long periods. A "random" basket of stocks, not restricted to the 500 largest, thus might be expected to outperform the S&P index—assuming the small cap bias persists in the future.

For that reason I find Harding's claim easy to believe. Investing in 50 random stocks is marvelously simple. The one thing I don't get: why would sophisticated investors pay hedge fund fees (assuming they are) for a system they could duplicate themselves? (Then again, Harding's fund may have a few twists he's not talking about on CNBC.)

How I Beat the Mind-Reading Machine by William Poundstone

Claude Shannon's original outguessing machine, at the MIT Museum

Claude Shannon's original outguessing machine, at the MIT Museum

Legend has it that no one, not the greatest scientific minds of the age, could consistently beat Claude Shannon's outguessing (or "mind-reading") machine at Bell Labs. The machine predicted "random" human choices. But since no one choose randomly, the machine always won its guessing game.

This is, I say, the legend. There are many anecdotes but no published statistics on how well humans fared against the machine. Few who were at Bell Labs in 1953 are around to tell the tale. (A notable exception is David Hagelbarger, who built first outguessing machine. I interviewed him for my book.)

I saw Shannon's machine at the storage facility of the MIT Museum. I wasn't able to play it, of course. That would have been almost impious, for it recorded a final score: Player 3507. Machine 5010.

Today, those wanting to experience the outguessing machine have several good virtual options. David Wong has a free app, Mind Reader (ICE factor), available for iOS or Android (left). A browser-based alternative, only a click away, is the "Mind-Reading Program" on the website for Michael Mauboussin's book The Success Equation.

With either you choose one of two alternatives by clicking (Mauboussin's program takes keyboard input as well). If the code predicts your choice, it wins a point; if you fool the computer, you get a point. With Wong's app you play as long as you want. On Mauboussin's site the first to rack up 50 points is the winner. May the best entity win.

The goal is to choose randomly. But almost everyone falls into unconscious patterns. The code keeps track of these and uses them to predict. The basic idea is that past behavior predicts future behavior. Having played both Wong's and Mauboussin's games a while, I can assure you it works. It takes about 25 moves for the machine to learn your play well enough to being predicting effectively. That part of the game is essentially luck (this relates to Mauboussin's book The Success Equation, which asks how to distinguish skill from luck in business and everything else). Thereafter the machine plays relentlessly and almost always reaches 50 points before the human player does—even when it has to overcome a player's early, lucky lead.

I found a way I could beat the machine, much of the time. I'll mention it because it say something about the game's psychology that hasn't, as far as I know, been discussed before.

I reasoned like this: Given that the goal is to play randomly, the game's feedback supplies no useful information. The bars showing who just won and who's ahead are trash talk, a distraction from the goal of being as random as possible. They should be ignored.

I found I did better when I tried to ignore the feedback, and better yet when I made sure I couldn't see the bars. (I resized the window so that the bars weren't visible in Mauboussin's game; covered the top part of my phone screen with Wong's app).

I'm not saying that I succeeded in being random. But, having written a book on the subject, I was at least aware of the common biases. In general we switch back and forth too much between choices and avoid long streaks of the same choice. In a truly random series of 50 binary choices, there is generally a streak of six consecutive identical choices (six "heads" or "tails" in a row). I didn't count, but I made an effort to stick on the same choice repeatedly, relative to what my instincts told me.

By this analysis, the scoring bars are not just a bell or whistle but a crucial part of the outguessing machine. The scoring bars were invented by Shannon's colleague David Hagelbarger, who built the first outguessing machine (above). Hagelbarger was motivated by gameplay considerations. He found that people thought the original machine's game boring until he added two rows of 25 lights across the top. They worked as some pinball machines did: Each time the machine won, a red light came on. Each time the human won, a green light came on. The goal was to light up an entire row of lights before the other.

In this version Hagelbarger's machine (right) became an office hit. Shannon took note and designed his own, improved version. It incorporated a version Hagelbarger's scoring bar. This wasn't lights but a sort of "Newton's cradle" with ball bearings. Shannon's brief publication on the device speaks of "a row of up to fifty balls." I take that to mean that the goal was to get 50 wins before the machine did. The photo's scoring scale runs
 

. . . . 20. . . . 40. . . . 60. . . . 80. . 

 

That suggests the goal was 100? Maybe this represented percentages, each win counted as 2 percent of the way to victory.

Another part of the outguessing machine legend, which I repeat in Rock Paper Scissors, is that Shannons' machine, which was simpler than Hagelbarger's, was a superior predictor. But the definition of "success" depends greatly on where you place the goal line. It is easier for an outguesser to get to 50 wins first, than to get to 25. My experience with the virtual machines is that they have very little advantage for the first 25 or so moves. I'm now wondering whether the mere fact of setting the goal at 50 wins accounted for Shannon's superior results. 

Either way, Hagelbarger latched on to an important concept. We crave positive reinforcement and cringe from negative reinforcement. This is how we learn as infants, children, and adults. Dieters do better with a scale; exercisers appreciate the quantitive data of a Fitbit. The outguessing machines supplied that, though it came with a catch. It encouraged players to frame their choices around "what worked the last time"—or what didn't work. This was indeed central to the Shannon machine's super-concise algorithm.

You might say that Bell Labs' mind-reading machines played a con game on their players. If so, that only made them the more prophetic. Big data is a con game in which it sets the context for the rigged questions it poses. In my book I quote Bell Labs mathematician David Slepian, speaking of Shannon: "My characterization of his smartness is that he would have been the world's best con man."


The Hillbilly, the Singing Midget, and the Mystery of Randomness by William Poundstone

The life of William Coffren (1867-1950) was straight out of Harry Stephen Keeler's screwball fiction. Under the stage name "Si Stebbins" Coffren played a hillbilly bumpkin (The Rube) in Barnum and Bailey's circus. He was married to Dolly the Doll, who stood 32 inches high and performed in a "Singing Midgets" act. A 1944 Milwaukee Journal article painted this picture of their domestic life:

"Midgets whom you see in circuses have midget furniture; but a woman with a full-size husband who travels about and rents apartments must do with things as they are. What if she does have to stand on a chair to turn on the hot water and wash the dishes? What if she does have to drag the chair with her to get up and put the biscuits in the oven? It is all right as long as Si likes her cooking. And he does.…

"She has business notions and opinions regarding show affairs. 'And when Dolly puts her foot down,' says Si, 'I have just got to mind.'"

In real life Coffren was no fool. He anticipated, by half a century and more, some contemporary ideas about the psychology of randomness. Today he is remembered for popularizing the Si Stebbins deck, well known to magicians. Coffren realized that the Holy Grail of card magic would be a deck that looked random but isn't. It was this insight that launched Coffren into the apparently more lucrative world of vaudeville magic. 
It is an awkward fact that a properly shuffled deck may not look random to the audience. There will often be suspicious clusters, such as five face face cards in a row, or an unbroken string of red cards, or two adjacent aces. A statistician would expect such clustering, but average people don’t.

This isn't unique to cards. Many people are convinced that their music player's shuffle play feature isn't random. It plays too many A$AP Rocky songs in a row! The fact is that we have a wrong mental notion of randomness. We expect random sequences to be better "shuffled" than they are, to have little clustering.

With carnie shrewdness, Coffren gave the marks what they thought they wanted. The Si Stebbins deck is an arrangement of cards that looks more random than a truly random deck does. The four suits run in lockstep order (clubs-hearts-spades-diamonds: remember it as CHaSeD). Each value increases by three. For that reason no two adjacent cards share a color, suit, or value. There are no clusters of any kind.

You may think this strict pattern would stick out like a sore thumb. It doesn’t. Even the fact that it's black-red-black-red… doesn't register. To anyone except a magician, the deck just looks random.

The Stebbins arrangement is easily memorized, and that’s the point. A performer who glimpses the bottom card of a cut can instantly deduce the card below it… which becomes the top card of the restored and squared-up deck. He can, if desired, name that card and every other card in the deck.

As the arrangement is circular, it is preserved through any number of honest cuts. An honest shuffle destroys the Stebbins order, of course, but the expert practitioner may use a false shuffle if desired.

Coffren made a career out of this simple gimmick, and so did a number of rivals who stole the idea. Actually it's unclear whether Coffren came up with the idea himself. Essentially the same concept is described in magic publications going back hundreds of years. In or about 1898, however, Coffren revealed his secret in a pamphlet, Card Tricks and the Way They Are Performed. The pamphlet's many endorsements of a cigar brand suggests that he had found another way to monetize his gimmick. 

You can find a scan of Card Tricks and the Way They Are Performed online.