Science -- Hoffrage et al. 290 (5500): 2261

Science. 2000 Dec 22;290(5500):2261-2.

MEDICINE:
Communicating Statistical Information

Ulrich Hoffrage,^* Samuel Lindsey, Ralph Hertwig, Gerd Gigerenzer

Decisions based on statistical information can mean the difference between life and death--for instance, when a cancer patient has to decide whether to undergo a painful medical procedure based on the likelihood that it will succeed, or when a jury has to decide whether to convict someone based on DNA evidence. Unfortunately, most of us, experts included, have difficulty understanding and combining statistical information effectively.

For example, faculty, staff, and students at Harvard Medical School were asked to estimate the probability of a disease given the following information (1): "If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5 per cent, what is the chance that a person found to have a positive result actually has the disease, assuming that you know nothing about the person's symptoms or signs?" The estimates varied wildly, ranging from the most frequent estimate, 95% (given by 27 out of 60 participants), to the correct answer, 2% (given by 11 out of 60 participants) (2). In a study requiring interpretation of mammography outcomes (3), almost all physicians confused the sensitivity of the test (the proportion of positive test results among people with the disease) with its positive predictive value (the proportion of people with the disease among those who receive a positive test result). This is a common confusion that even crops up in scholarly articles (3) and statistical textbooks (4) and certainly affects the ability of lay people (5) to understand the statistical information. Recent discussions of genetic testing have indicated that genetic counselors are experiencing the same difficulty (6).

It makes little mathematical difference whether statistics are expressed as probabilities, percentages, or absolute frequencies. It does, however, make a psychological difference. More specifically, statistics expressed as natural frequencies improve the statistical thinking of experts and nonexperts alike.

Natural Frequencies
To illustrate how natural frequencies differ from probabilities, we use the example of a cancer screening test. The probability of colorectal cancer can be given as 0.3% [base rate]. If a person has colorectal cancer, the probability that the hemoccult test is positive is 50% [sensitivity]. If a person does not have colorectal cancer, the probability that he still tests positive is 3% [false-positive rate]. What is the probability that a person who tests positive actually has colorectal cancer? A restatement of the same problem in terms of natural frequencies would be that out of every 10,000 people, 30 have colorectal cancer. Of these, 15 will have a positive hemoccult test. Out of the remaining 9970 people without colorectal cancer, 300 will still test positive. How many of those who test positive actually have colorectal cancer?

Only 1 out of 24 physicians gave the correct answer when the statistical information was expressed in probabilities (7). When it was presented in natural frequencies, 16 out of 24 other physicians gave the correct answer: 15 out of 315 (i.e., 5%). Whereas natural frequencies seem to help people make statistical inferences, probabilities apparently hinder them. Unfortunately, in contexts in which the positive predictive value of a test is at issue, statistics are typically expressed and communicated in the form of probabilities, although they can easily be translated into natural frequencies, as follows:

1. Select a population and use the base rate to determine how many people in the population have the disease.

2. Take that result and use the test's sensitivity to determine how many people have the disease and a positive test.

3. Take the remaining number of healthy people and use the test's false-positive rate to determine how many people do not have the disease but still test positive.

4. Compare the number obtained in step 2 with the sum of those obtained in steps 2 and 3 to determine how many people with a positive test actually have the disease.

Natural frequencies facilitate inferences because they carry implicit information about base rates and reduce the number of computations required to determine the positive predictive value of a test (8, 9, 10). They also correspond to the way in which humans have experienced statistical information over most of their history.

Applications in Medicine
To illustrate the effect of natural frequencies, we asked 96 advanced medical students to solve four realistic diagnostic tasks. Each participant worked on two probability and two frequency versions; the order of representation format and which task was in which format was balanced (11). For each of the tasks, more participants correctly inferred the likelihood of having the disease given a positive test when the statistics were communicated as natural frequencies (Fig. 1).

Fig. 1. Interpreting statistics. Medical students' percentage of correct inferences in four realistic diagnostic tasks.

Other medical practitioners could also profit from representing statistical information in terms of natural frequencies. Consider the statistics AIDS counselors must understand and communicate. In Germany, the prevalence of HIV in heterosexual men who are not in any known risk group is around 0.01%. The false-positive rate of the HIV test (in which one blood sample is subjected to multiple tests) is around 0.01%, and its sensitivity is around 99.9% [exact estimates vary (12)]. To explore how counselors actually communicate these risks, we sent a male, low-risk client to 20 German public health centers to have 20 HIV tests. During the mandatory pretest counseling, the client asked the counselor about the prevalence, sensitivity, false-positive rate, and the chance that he actually had the virus if the test were positive (13). Not a single counselor communicated the risks to the client in natural frequencies. Instead, they used probabilities and percentages, and, in the majority of the counseling sessions, the information was either inconsistent or wrong. For instance, one counselor estimated the base rate and the false-positive rate to be around 0.1%, and the sensitivity to be 99.9%, and then stated that the client's probability of infection given a positive test is also 99.9% (applying steps 1 to 4 above to his estimates yields a probability of 50%). In fact, 15 out of the 20 counselors told this low-risk client that it is 99.9% or 100% certain that he has HIV if he tests positive (applying steps 1 to 4 to the numbers found in the literature yields an actual probability of 50%).

Percentages can mislead in other ways. For example, it may sound impressive to learn that mammography screening can reduce the risk of breast cancer fatality in women by 25% [for 50- to 74-year-old women (14)]. However, this percentage does not say anything about the actual frequencies. If 4 out of 1000 women without symptoms die of breast cancer within the next 10 years (15), the relative risk reduction of 25% means that 1 woman in 1000 women who undergo screening would be saved. A woman without symptoms is most likely not one of the 4 to whom the risk reduction applies, but one of the other 996 instead--and many of these women may suffer as a result from the screening. For instance, false-positives occur and, moreover, cancers that grow so slowly that they present little risk will be diagnosed and unnecessarily treated. As long as health organizations inform women in terms of probabilities and relative risk reduction about the benefits and harms of screening, a truly informed decision is unlikely.

Applications in Law
Determinations of facts and verdicts in legal proceedings often depend on scientific evidence. The communication of statistics is as important to the making of legal decisions by judges, attorneys, forensic experts, and jurors as it is to medical decision-makers (16, 17). In considering the admissibility standards for scientific evidence, the U.S. Supreme Court has specifically indicated that courts need to consider "known or potential rate of error, and the existence and maintenance of standards controlling the technique's operation" (18).

In a study conducted in Germany, we asked 27 professionals who would soon qualify as judges and 127 advanced law students to evaluate two criminal-court case files involving rape (19). In both cases, a DNA match was reported between a DNA sample from the defendant and one recovered from the victim. Aside from this evidence, there was little reason to suspect that the defendant was the perpetrator. Expert testimony reported the frequency of the recovered DNA profile as 1 in 1,000,000 and then stated that it was practically certain that the analysis would show a match for a person who indeed had the DNA profile (in other words, sensitivity = 100%). The expert also reported the rates of technical and human mishaps leading to false-positive results in laboratory tests to be about 0.003 (20).

When these statistics were expressed as probabilities, only 13% of the professionals and under 1% of the law students correctly inferred the probability that the defendant was actually the source of the trace. But when the identical statistics were stated as natural frequencies, 68% and 44% of these same participants made the correct inference (Fig. 2, left). The different ways of expressing the same statistical information altered the verdicts in each case. When the information was presented as probabilities, 45% of the professionals and 55% of the students rendered a verdict of guilty, but only 32% and 33% did so when the same statistics were expressed as natural frequencies (Fig. 2, right). When verdicts hinge on statistical evidence, understanding that evidence is crucial, and pursuing this simple method of fostering statistical insight could contribute to that goal (21, 22).

Fig. 2. Interpreting statistics. Legal experts' percentage of correct inferences (left) and of guilty verdicts (right) in two criminal court case files.

Implications for Teaching
The beneficial effects of natural frequencies on statistical reasoning in the studies reported above occurred without training or instruction. Systematic training in the use of natural frequencies can even help people to reason with probabilities. The key is to teach representations rather than rules--that is to teach people how to translate probabilities into natural frequencies, as shown in steps 1 to 4. Traditionally, however, students are instead taught how to plug probabilities into mathematical formulas such as Bayes's rule.

Teaching representations rather than rules--and expressing statistical information in natural frequencies where appropriate--can help to foster the statistical reasoning needed to make sound decisions.

References and Notes

W. Casscells, A. Schoenberger, T. Grayboys, N. Engl. J. Med. 299, 999 (1978).
The correct solution can be obtained by applying Bayes's rule.
D. M. Eddy, in Judgment Under Uncertainty: Heuristics and Biases, D. Kahneman, P. Slovic, A. Tversky, Eds. (Cambridge Univ. Press, Cambridge, 1982), pp. 249-267.
G. Gigerenzer, in A Handbook for Data Analysis in the Behavioral Sciences: Methodological Issues, G. Keren, C. Lewis, Eds. (Erlbaum, Hillsdale, NJ, 1993), pp. 313-339.
J. J. Koehler, Behav. Brain Sci. 19, 1 (1996).
R. Weiss, Washington Post, 2 December 2000, p. A10.
U. Hoffrage, G. Gigerenzer, Acad. Med. 73, 538 (1998).
G. Gigerenzer, U. Hoffrage, Psychol. Rev. 102, 684 (1995).
------, Psychol. Rev. 106, 425 (1999).
B. Mellers, A. McGraw, Psychol. Rev. 106, 417 (1999).
The tasks are displayed at http://www-abc.mpib-berlin.mpg.de/users/hoffrage/papers/4tasks.html.
G. J. Stine, Acquired Immune Deficiency Syndrome: Biological, Medical, Social, and Legal Issues (Prentice- Hall, Englewood Cliffs, NJ, 1996).
G. Gigerenzer, U. Hoffrage, A. Ebert, AIDS Care 10, 197 (1998).
K. Kerlikowske et al., JAMA 273, 149 (1995).
L. Nystöm, et al., J. Med. Screen. 3, 85 (1997).
See, e.g., J.J. Koehler, Jurimetrics 34, 21 (1993).
------, Univ. Colorado Law Rev. 67, 859 (1996).
Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993), pp. 593-594.
S. Lindsey, R. Hertwig, G. Gigerenzer, in preparation.
An average of estimates based on laboratory proficiency tests. See J. J. Koehler, A. Chia, S. Lindsey, Jurimetrics 35, 201 (1995).
S. Breyer, Science 280, 537 (1998).
R. Hertwig, U. Hoffrage, in Frequency Processing and Cognition, P. Sedlmeier, T. Betsch, Eds. (Oxford Univ. Press, New York, in press).
We thank the German Research Foundation for financial support (Ho 1847/1 and He 2768/6-1).

U. Hoffrage, R. Hertwig, and G. Gigerenzer are at the Max Planck Institute for Human Development, Lentzeallee 94, 14195 Berlin, Germany. S. Lindsey is at the Department of Psychology, 102 Gilmer Hall, University of Virginia, Charlottesville, VA 22903, USA.

*To whom correspondence should be addressed. E-mail: hoffrage@mpib-berlin.mpg.de

This Policy Forum was collaboratively written to combine work submitted to Science independently by the first and second authors.

Volume 290, Number 5500, Issue of 22 Dec 2000, pp. 2261-2262.
Copyright © 2000 by The American Association for the Advancement of Science.