From Surf Wiki (app.surf) — the open knowledge base
Simpson's paradox
Error in statistical reasoning with groups
Error in statistical reasoning with groups

Simpson's paradox is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined. This result is often encountered in social-science and medical-science statistics,
Simpson's paradox has been used to illustrate the kind of misleading results that the misuse of statistics can generate.
Edward H. Simpson first described this phenomenon in a technical paper in 1951; | author1-link = Karl Pearson | doi-access = free
Examples
UC Berkeley gender bias
One of the best-known examples of Simpson's paradox comes from a study of gender bias among graduate school admissions to University of California, Berkeley. The admission figures for the fall of 1973 showed that men applying were more likely than women to be admitted, and the difference was so large that it was unlikely to be due to chance.{{cite journal
| All | Men | Women | Applicants | Admitted | Applicants | Admitted | Applicants | Admitted | Total |
|---|---|---|---|---|---|---|---|---|---|
| 12,763 | 41% | 8,442 | 44% | 4,321 | 35% |
However, when taking into account the information about departments being applied to, the different rejection percentages reveal the different difficulty of getting into the department, and at the same time it showed that women tended to apply to more competitive departments with lower rates of admission, even among qualified applicants (such as in the English department), whereas men tended to apply to less competitive departments with higher rates of admission (such as in the engineering department). The pooled and corrected data showed a "small but statistically significant bias in favor of women".
The data from the six largest departments are listed below:
| Department | All | Men | Women | Applicants | Admitted | Applicants | Admitted | Applicants | Admitted | A | B | C | D | E | F | Total | 4526 | 39% | 2691 | 45% | 1835 | 30% |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 933 | 64% | 825 | 62% | 108 | 82% | |||||||||||||||||
| 585 | 63% | 560 | 63% | 25 | 68% | |||||||||||||||||
| 918 | 35% | 325 | 37% | 593 | 34% | |||||||||||||||||
| 792 | 34% | 417 | 33% | 375 | 35% | |||||||||||||||||
| 584 | 25% | 191 | 28% | 393 | 24% | |||||||||||||||||
| 714 | 6% | 373 | 6% | 341 | 7% |
The entire data showed total of 4 out of 85 departments to be significantly biased against women, while 6 to be significantly biased against men (not all present in the 'six largest departments' table above). Notably, the numbers of biased departments were not the basis for the conclusion, but rather it was the gender admissions pooled across all departments, while weighing by each department's rejection rate across all of its applicants.
Kidney stone treatment
Another example comes from a real-life medical study{{Cite journal
| Treatment A | Treatment B | Small stones | Large stones | Both |
|---|---|---|---|---|
| Group 1 | ||||
| 93% (81/87) | Group 2 | |||
| 87% (234/270) | ||||
| Group 3 | ||||
| 73% (192/263) | Group 4 | |||
| 69% (55/80) | ||||
| 78% (273/350) | 83% (289/350) |
The paradoxical conclusion is that treatment A is more effective when used on small stones, and also when used on large stones, yet treatment B appears to be more effective when considering both sizes at the same time. In this example, the "lurking" variable (or confounding variable) causing the paradox is the size of the stones, which was not previously known to researchers to be important until its effects were included.
Which treatment is considered better is determined by which success ratio (successes/total) is larger. The reversal of the inequality between the two ratios when considering the combined data, which creates Simpson's paradox, happens because two effects occur together:
- The sizes of the groups, which are combined when the lurking variable is ignored, are very different. Doctors tend to give cases with large stones the better treatment A, and the cases with small stones the inferior treatment B. Therefore, the totals are dominated by groups 3 and 2, and not by the two much smaller groups 1 and 4.
- The lurking variable, stone size, has a large effect on the ratios; i.e., the success rate is more strongly influenced by the severity of the case than by the choice of treatment. Therefore, the group of patients with large stones using treatment A (group 3) does worse than the group with small stones, even if the latter used the inferior treatment B (group 2). Based on these effects, the paradoxical result is seen to arise because the effect of the size of the stones overwhelms the benefits of the better treatment (A). In short, the less effective treatment B appeared to be more effective because it was applied more frequently to the small stones cases, which were easier to treat, so that whichever treatment was selected was more likely to be successful.
Jaynes argues that the correct conclusion is that though treatment A remains noticeably better than treatment B, the kidney stone size is more important.
Batting averages
A common example of Simpson's paradox involves the batting averages of players in professional baseball. It is possible for one player to have a higher batting average than another player each year for a number of years, but to have a lower batting average across all of those years. This phenomenon can occur when there are large differences in the number of at bats between the years. Mathematician Ken Ross demonstrated this using the batting average of two baseball players, Derek Jeter and David Justice, during the years 1995 and 1996:
| 1995 | 1996 | Combined |
|---|---|---|
| Derek Jeter | 12/48 | .250 |
| David Justice | 104/411 | .253 |
In both 1995 and 1996, Justice had a higher batting average (in bold type) than Jeter did. However, when the two baseball seasons are combined, Jeter shows a higher batting average than Justice. According to Ross, this phenomenon would be observed about once per year among the possible pairs of players.
Vector interpretation
Simpson's paradox can also be illustrated using a 2-dimensional vector space. A success rate of \frac{p}{q} (i.e., successes/attempts) can be represented by a vector \vec{A} = (q, p), with a slope of \frac{p}{q}. A steeper vector then represents a greater success rate. If two rates \frac{p_1}{q_1} and \frac{p_2}{q_2} are combined, as in the examples given above, the result can be represented by the sum of the vectors (q_1, p_1) and (q_2, p_2), which according to the parallelogram rule is the vector (q_1 + q_2, p_1 + p_2), with slope \frac{p_1 + p_2}{q_1 + q_2}.
Simpson's paradox says that even if a vector \vec{L}_1 (in orange in figure) has a smaller slope than another vector \vec{B}_1 (in blue), and \vec{L}_2 has a smaller slope than \vec{B}_2, the sum of the two vectors \vec{L}_1 + \vec{L}_2 can potentially still have a larger slope than the sum of the two vectors \vec{B}_1 + \vec{B}_2, as shown in the example. For this to occur one of the orange vectors must have a greater slope than one of the blue vectors (here \vec{L}_2 and \vec{B}_1), and these will generally be longer than the alternatively subscripted vectors – thereby dominating the overall comparison.
Correlation between variables
Simpson's reversal can also arise in correlations, in which two variables appear to have (say) a positive correlation towards one another, when in fact they have a negative correlation, the reversal having been brought about by a "lurking" confounder. Berman et al. give an example from economics, where a dataset suggests overall demand is positively correlated with price (that is, higher prices lead to more demand), in contradiction of expectation. Analysis reveals time to be the confounding variable: plotting both price and demand against time reveals the expected negative correlation over various periods, which then reverses to become positive if the influence of time is ignored by simply plotting demand against price.
Psychology
Psychological interest in Simpson's paradox seeks to explain why people deem sign reversal to be impossible at first. The question is where people get this strong intuition from, and how it is encoded in the mind.
Simpson's paradox demonstrates that this intuition cannot be derived from either classical logic or probability calculus alone, and thus led philosophers to speculate that it is supported by an innate causal logic that guides people in reasoning about actions and their consequences. Savage's sure-thing principle is an example of what such logic may entail. A qualified version of Savage's sure thing principle can indeed be derived from Pearl's do-calculus and reads: "An action A that increases the probability of an event B in each subpopulation Ci of C must also increase the probability of B in the population as a whole, provided that the action does not change the distribution of the subpopulations." This suggests that knowledge about actions and consequences is stored in a form resembling Causal Bayesian Networks.
Probability
A paper by Pavlides and Perlman presents a proof, due to Hadjicostas, that in a random 2 × 2 × 2 table with uniform distribution, Simpson's paradox will occur with a probability of exactly . | name-list-style=amp |date=August 2009
Simpson's second paradox
A second, less well-known paradox was also discussed in Simpson's 1951 paper. It can occur when the "sensible interpretation" is not necessarily found in the separated data, like in the kidney stone example, but can instead reside in the combined data. Whether the partitioned or combined form of the data should be used hinges on the process giving rise to the data, meaning the correct interpretation of the data cannot always be determined by simply observing the tables.
Judea Pearl has shown that, in order for the partitioned data to represent the correct causal relationships between any two variables, X and Y, the partitioning variables must satisfy a graphical condition called "back-door criterion":
- They must block all spurious paths between X and Y
- No variable can be affected by X This criterion provides an algorithmic solution to Simpson's second paradox, and explains why the correct interpretation cannot be determined by data alone; two different graphs, both compatible with the data, may dictate two different back-door criteria.
When the back-door criterion is satisfied by a set Z of covariates, the adjustment formula (see confounding) gives the correct causal effect of X on Y. If no such set exists, Pearl's do-calculus can be invoked to discover other ways of estimating the causal effect. The completeness of do-calculus can be viewed as offering a complete resolution of the Simpson's paradox.
Criticism
One criticism is that the paradox is not really a paradox at all, but rather a failure to properly account for confounding variables or to consider causal relationships between variables. Focus on the paradox may distract from these more important statistical issues.
Another criticism of the apparent Simpson's paradox is that it may be a result of the specific way that data are stratified or grouped. The phenomenon may disappear or even reverse if the data is stratified differently or if different confounding variables are considered. Simpson's example actually highlighted a phenomenon called noncollapsibility, which occurs when subgroups with high proportions do not make simple averages when combined. This suggests that the paradox may not be a universal phenomenon, but rather a specific instance of a more general statistical issue.
Despite these criticisms, the apparent Simpson's paradox remains a popular and intriguing topic in statistics and data analysis. It continues to be studied and debated by researchers and practitioners in a wide range of fields, and it serves as a valuable reminder of the importance of careful statistical analysis and the potential pitfalls of simplistic interpretations of data.
References
Bibliography
- Leila Schneps and Coralie Colmez, Math on trial. How numbers get used and abused in the courtroom, Basic Books, 2013. . (Sixth chapter: "Math error number 6: Simpson's paradox. The Berkeley sex bias case: discrimination detection").
References
- Holt, G. B. (2016). [http://jco.ascopubs.org/content/34/9/1016.1.full Potential Simpson's paradox in multicenter study of intraperitoneal chemotherapy for ovarian cancer.] Journal of Clinical Oncology, 34(9), 1016–1016.
- (2017). "Post-transcriptional regulation across human tissues". PLOS Computational Biology.
- [[Judea Pearl]]. ''Causality: Models, Reasoning, and Inference'', Cambridge University Press (2000, 2nd edition 2009). {{isbn. 0-521-77362-8.
- Kock, N., & Gaskins, L. (2016). [http://cits.tamiu.edu/kock/pubs/journals/2016JournalIJANS_ModJCveNetCorrp/Kock_Gaskins_2016_IJANS_SimpPdox.pdf Simpson's paradox, moderation and the emergence of quadratic relationships in path models: An information systems illustration.] International Journal of Applied Nonlinear Science, 2(3), 200–234.
- Rogier A. Kievit, Willem E. Frankenhuis, Lourens J. Waldorp and Denny Borsboom, Simpson's paradox in psychological science: a practical guide https://doi.org/10.3389/fpsyg.2013.00513
- Robert L. Wardrop (February 1995). "Simpson's Paradox and the Hot Hand in Basketball". ''The American Statistician'', ''' 49 (1)''': pp. 24–28.
- [[Alan Agresti]] (2002). "Categorical Data Analysis" (Second edition). [[John Wiley and Sons]] {{isbn. 0-471-36093-7
- [[David A. Freedman. David Freedman]], Robert Pisani, and Roger Purves (2007), ''Statistics'' (4th edition), [[W. W. Norton & Company. W. W. Norton]]. {{isbn. 0-393-92972-8.
- (2003). "Probability theory: the logic of science". Cambridge University Press.
- Ken Ross. "''A Mathematician at the Ballpark: Odds and Probabilities for Baseball Fans (Paperback)''" Pi Press, 2004. {{isbn. 0-13-147990-3. 12–13
- Statistics available from [[Baseball-Reference.com]]: [https://www.baseball-reference.com/j/jeterde01.shtml Data for Derek Jeter]; [https://www.baseball-reference.com/j/justida01.shtml Data for David Justice].
- Kocik Jerzy. (2001). "Proofs without Words: Simpson's Paradox". [[Mathematics Magazine]].
- Berman, S. DalleMule, L. Greene, M., Lucker, J. (2012), "[http://www.statslife.org.uk/the-statistics-dictionary/2012-simpson-s-paradox-a-cautionary-tale-in-advanced-analytics Simpson's Paradox: A Cautionary Tale in Advanced Analytics] {{Webarchive. link. (2020-05-10 ", ''[[Significance (magazine)). Significance]]''.
- Kock, N. (2015). [http://cits.tamiu.edu/kock/pubs/journals/2015JournalIJeC/Kock_2015_IJeC_SimpPdox.pdf How likely is Simpson's paradox in path models?] International Journal of e-Collaboration, 11(1), 1–7.
- (August 2015). "Simpson's paradox ... and how to avoid it". Significance.
- (2014). "Understanding Simpson's Paradox". The American Statistician.
- (1993). "Graphical Models, Causality, and Intervention". Statistical Science.
- (2018). "The Book of Why: The New Science of Cause and Effect". Basic Books.
- (2006). "Identification of Conditional Interventional Distributions". AUAI Press.
- Blyth, Colin R.. (June 1972). "On Simpson's Paradox and the Sure-Thing Principle". Journal of the American Statistical Association.
- (June 2011). "The Simpson's paradox unraveled". International Journal of Epidemiology.
- Greenland, Sander. (2021-11-01). "Noncollapsibility, confounding, and sparse-data bias. Part 2: What should researchers make of persistent controversies about the odds ratio?". Journal of Clinical Epidemiology.
This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page.
Ask Mako anything about Simpson's paradox — get instant answers, deeper analysis, and related topics.
Research with MakoFree with your Surf account
Create a free account to save articles, ask Mako questions, and organize your research.
Sign up freeThis content may have been generated or modified by AI. CloudSurf Software LLC is not responsible for the accuracy, completeness, or reliability of AI-generated content. Always verify important information from primary sources.
Report