Gerd Gigerenzer’s debate with Daniel Kahneman, Amos Tversky, and collaborators in the heuristics and biases research program has been one of the more interesting sideshows in psychology. Although the debate often appears to be more of framing than substance, it largely revolves around the question of whether we should characterize “biased” human decisions as error or “irrationality.”
Kahneman and Tversky made their mark with a series of papers in which they proposed that people make decisions under uncertainty by using a small number of “heuristics,” or rules of thumb. These heuristics reduce the otherwise complex calculations to simpler judgments. While often useful, Kahneman and Tversky demonstrated that such heuristics could lead to systematic errors. In other words, they can lead to “biased” decisions.
An example heuristic is anchoring and adjustment. Start an estimate from an initial value (the anchor) then adjust toward a final answer. Although useful in some situations, this heuristic can cause problems, because people will often use an irrelevant or weakly relevant initial value, and then insufficiently adjust from that anchor.
Should we replace the biased humans with unbiased algorithms? Or will the use of fast and frugal heuristics hold the humans in good stead?
The program of work triggered by Kahneman and Tversky has now grown to a massive catalogue of heuristics and associated biases. Humans are now labelled as “predictably irrational.”
Gigerenzer takes a different view of human decision making. He argues that although simple heuristics often yield “biased” decisions, they can deliver a better answer. This is particularly the case in uncertain or complex environments, or where there is only a small sample from which to draw conclusions. People implement these heuristics through their gut feelings, with the selection of the appropriate heuristic a function of the unconscious.
Consider the gaze heuristic: when catching a ball, run so that the ball moves in a straight line at constant velocity in your gaze. This will lead you to where the ball will land.
The gaze heuristic can lead someone to run in an indirect or curved line. They don’t run straight to where they should wait for the ball’s arrival. The movement appears inefficient, but it allows a complex calculation to be replaced with something tractable for a human.
Gigerenzer and colleagues have generated a substantial body of evidence that humans use these simple heuristics, often to great power. Humans, and dogs, use the gaze heuristic. Laypeople and amateur players used the recognition heuristic to pick a Wimbledon match winner in 90 percent of those cases when it could be applied: if you recognize one of two players and not the other, predict that the one you recognize will win. Similarly, people judging the relative size of two cities used the recognition heuristic in 90 percent of the cases when they could.
Like Kahneman and Tversky, Gigerenzer’s work has generated a legacy of researchers exploring the power of “fast and frugal” heuristics and how they are used by humans. In this view, humans, through their use of simple heuristics, are quite smart.
Replacing humans with algorithms
A shallow reading of the differing positions of Kahneman, Tversky, Gigerenzer, and friends can lead to contrasting perspectives on whether you should use algorithms to replace human judgment. Should we replace the biased humans with unbiased algorithms? Or will the use of fast and frugal heuristics hold the humans in good stead?
As I catalogued in a previous Behavioral Scientist article, the evidence accumulated to date on the decision-making competition between algorithms and humans is relatively one-sided. Since Paul Meehl’s book Clinical Versus Statistical Prediction, published in 1954, a large literature has emerged comparing the two. Spanning areas such as medical and psychiatric diagnosis, job performance, university admissions, and procurement, algorithms typically come out on top.
How should we reconcile a view of good human decision-making using simple heuristics with the apparently straightforward picture of the superiority of algorithms?
For example, William Grove and his colleagues looked at 136 studies in medicine and psychiatry in which algorithms had been compared to expert judgement. In 63 of these studies, the algorithm was superior. In 65 there was a tie. This left 8 studies in which the human was the better option.
So how should we reconcile a view of good human decision-making using simple heuristics with the apparently straightforward picture of the superiority of algorithms?
A good starting point is to recognize that many of these algorithms are simple.
The power of simplicity
A revisit of Meehl’s Clinal Versus Statistical Prediction provides an illustration. Many of the algorithms competing against the clinicians involved a simple tallying of the number of factors for or against a certain diagnosis. The most complicated methods involved regression. These likely involved calculations done by hand.
We see a similar pattern across most of the literature comparing human judgment to algorithms. It is not state-of-the-art machine learning and artificial intelligence being used to make the algorithmic judgments, although those examples are also becoming more common. Rather, simple actuarial and statistical techniques, and often informal techniques, generate these results.
In a 1979 paper Robyn Dawes demonstrated why these simple methods can work, showing the power of “improper linear models.” Linear models with weights obtained through nonoptimal methods can still outperform clinical judgment and, in the areas Dawes examined, performed surprisingly strongly relative to models with weights derived from the data.
Gigerenzer’s work has also demonstrated the power of simple methods. For instance, in Simple Heuristics That Make Us Smart, Jean Czerlinski, Gigerenzer, and Daniel Goldstein describe a competition between some simple heuristics and the more complex multiple regression; both were to predict outcomes across 20 environments, such as school dropout rates and fish fertility.
One competitor in their competitions was “Take the Best,” which works through cues in order of validity in predicting the outcome. For example, if you want to know which of two schools has the highest dropout rate, attendance rate has the highest validity. If one school has lower attendance than the other, infer that that school has the higher dropout rate. If the attendance rate is the same, look at the next cue.
Modern discussions of whether humans will be replaced by algorithms typically frame the problem as a choice between humans on one hand or complex statistical and machine learning models on the other. Yet much of the past success of algorithms relative to human judgment points us to a third option.
Depending on the precise specifications, the result of the competition was either a victory for Take the Best or at least equal performance with multiple regression. This is impressive for something that is less computationally expensive and ignores much of the data (in other words, is biased).
There are some differences between the algorithms typically involved in the comparisons with humans and those heuristics explored by Gigerenzer and friends. The primary one is that some of Gigerenzer and friends’ heuristics are even simpler and often exclude information. Rather than tallying all cues like Dawes’s approach, Take the Best involves looking at cues only until finding one that discriminates. The recognition heuristic relies on a lack of knowledge to work. In some cases, “less is more.” But despite these differences, the simplicity of both is a stark contrast to the advanced methods that get many people excited today.
So what is it about simple algorithms that gives them an edge over humans using simple heuristics? At least part of the answer lies in what Kahneman and friends have recently labelled “noise.”
Noise
Although humans have been labelled as predictably irrational, much human decision-making can also be characterized as inconsistent. There is noise in the decisions.
When presented with the same information on different occasions, people often draw different conclusions. For instance, nine radiologists judging on separate occasions the malignancy of the same case of gastric ulcers had a correlation of between 0.60 and 0.92 with their own judgments, contradicting themselves around 20 percent of the time. A group of experienced software professionals asked to estimate the effort to undertake a software task differed as much as 71 percent in estimating the same task a second time, with a correlation of 0.7 between their own estimates. This inconsistency tends to increase when we examine decisions across different humans.
Algorithms, in contrast to humans, are typically consistent, returning the same decision each time. This difference in consistency is so marked that models of human decision makers, developed through a method called bootstrapping (not to be confused with the statistical term bootstrapping), typically outperform the decision makers they are modeled upon. For example, in one study, models developed from decisions of clinical psychologists tended to outperform most of those same psychologists in differentiating psychotic from neurotic patients.
This evidence of the superiority of the mechanical application of simple models—even if constructed from the decisions of experts themselves—suggests that humans don’t use these simple models for many decisions, or at best they use them inconsistently in the environments in which these measures are typically made.
Simple heuristics that make algorithms smart
What might we take from the above?
Modern discussions of whether humans will be replaced by algorithms typically frame the problem as a choice between humans on one hand or complex statistical and machine learning models on the other. For problems such as image recognition, this is probably the right frame. Yet much of the past success of algorithms relative to human judgment points us to a third option: the mechanical application of simple models and heuristics.
Simple models appear more powerful when removed from the minds of the human and implemented in a consistent way. The chain of evidence that simple heuristics are powerful tools, that humans use these heuristics, and that these heuristics can make us smart does not bring us to a point where these humans are outperforming simple heuristics or models consistently applied by an algorithm.
Humans are inextricably entwined in developing these algorithms, and in many cases provide the expert knowledge of what cues should be used. But when it comes to execution, taking the outputs of the model gives us a better outcome.
There is one thread from the work on simple heuristics that suggests we might be able to improve algorithms performance even further, and that is considering whether the algorithms could be even simpler. While unweighted counting of all cues could be effective, exclusion of cues in simpler approaches such as Take the Best might perform even better in some circumstances. This is an empirical question worth examining.
I must now admit that there is one territory in which we need to further explore this question before giving all credit to the algorithms. This is one of the territories in which the simple heuristics that humans use work best: unstable environments. Most of the competitions between algorithms and humans involve stable environments with at least a modicum of data; there can be no structural change in the environment to throw the algorithm off. In an uncertain world, do simple heuristics in the minds of humans outperform more mechanical approaches?
As I suggested at the end of my previous article, that needs to be the subject of another discussion. But even with that territory underexplored, we should learn from our use of simple heuristics and their power in the environments in which we make many of our most important decisions. We can also learn from the benefits of a more mechanical application. We can use simple heuristics to make algorithms smart.