Microsoft Store
 

Simpson's paradox


 

Simpson's paradox (or the Yule-Simpson effect) is a statistical paradox described by E. H. Simpson in 1951 and G. U. Yule in 1903, in which the successes of several groups seem to be reversed when the groups are combined. This seemingly impossible result is encountered surprisingly often in social science and medical statistics.

Explanation by example

To illustrate the paradox, suppose two people, Lisa and Bart, are let loose on Wikipedia. In the first week, Lisa improves 60 percent of the articles she edits while Bart improves 90 percent of the articles he edits. In the second week, Lisa improves just 10 percent of the articles she edits, while Bart improves 30 percent.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Both times, Bart improved a much higher percentage of articles than Lisa—yet when the two tests are combined, Lisa has improved a much higher percentage than Bart!

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

That result comes about this way: In the first week, Lisa edits 100 articles, improving 60 of them, while Bart edits just 10 articles, improving all but one. In the second week, Lisa edits only 10 articles, improving one, while Bart edits 100 articles, improving 30. When two week's worth of work is combined, both edited the same number of articles, yet Lisa improved 55% of them (61 in total) while Bart improved only 35% of them (39 in total)!

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

It appears that the two sets of data separately support a certain hypothesis, but, considered together, support the opposite hypothesis.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

To recap, introducing some notation that will be useful later:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

  • In the first week
  • :* S_A(1) = 60% — Lisa improved 60% of the many articles she edited.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    :* S_B(1) = 90% — Bart had a 90% success rate during that time.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    : Success is associated with Bart.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

  • In the second week
  • :* S_A(2) = 10% — Lisa managed 10% in her busy life.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    :* S_B(2) = 30% — Bart achieved a 30% success rate.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    : Success is associated with Bart.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    On both occasions Bart's edits were more successful than Lisa's. But if we combine the two sets, we see that Lisa and Bart both edited 110 articles, and:

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

  • S_A = egin{matrix} rac{61}{110}end{matrix} — Lisa improved 61 articles.
  • S_B = egin{matrix} rac{39}{110}end{matrix} — Bart improved only 39.
  • S_A > S_B — Success is now associated with Lisa.
  • Bart is better for each set but worse overall!

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    The arithmetical basis of the paradox is uncontroversial. If S_B(1) > S_A(1) and S_B(2) > S_A(2) we feel that S_B must be greater than S_A. However if different weights are used to form the overall score for each person then this feeling may be disappointed. Here the first test is weighted egin{matrix} rac{100}{110}end{matrix} for Lisa and egin{matrix} rac{10}{110}end{matrix} for Bart while the weights are reversed on the second test.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

  • S_A = egin{matrix} rac{100}{110}end{matrix}S_A(1) + egin{matrix} rac{10}{110}end{matrix}S_A(2)
  • S_B = egin{matrix} rac{10}{110}end{matrix}S_B(1) + egin{matrix} rac{100}{110}end{matrix}S_B(2)
  • By more extreme reweighting A's overall score can be pushed up towards 60% and B's down towards 30%.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    Who is more accomplished? Lisa and Bart's mutual friends think Lisa is better—her overall success rate is higher. But it is possible to retell the story so that it appears obvious that Bart is more diligent. Suppose the case were as follows:

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    In the first week, Lisa and Bart muddle around fixing spelling errors or accidentally Americanising the pages. In the second week, both try their hands as wordsmiths, adding clarity in some cases and resulting in lateral change for most. The numerical data is as before: Bart is better at either task, but his overall success rate is worse because almost all of his changes (100 out of 110) required some deal of thought, while almost all of Lisa's (100 out of 110) were trivial. The association of success with Lisa in that case would be misleading, even spurious.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~