Multiple comparisons


At the end of the lecture and having completed the exercises students should be able to:


Altman, D.G., 1991. Practical statistics for medical research, pp. 210-212

A brief discussion of the problem of multiple comparisons with some suggested solutions.

Bland, J.M & D.G. Altman, 1995. Multiple significance tests: the Bonferroni method. BMJ 310: 170.

A fairly straightforward outline of a method for trying to overcome some of the problems of multiple comparisons.

A problem

Currently a specific drug (A) is used to treat bad breath. Rapid advances in pharmacological research produce twenty new drugs (B to U) that also treat bad breath. All twenty drugs are more expensive to produce so we would only want to use them if they produced significantly better results.

You carry out 20 studies comparing each of the drugs (B to U) in turn with the standard treatment (A). The results of these studies are analysed using t-tests with a significance level of 5%.

Drug H showed a large and clinically important improvement over the standard treatment and the result was statistically significant at the 5% level. Should drug H be used in preference to drug A from now on?

The correct answer is: "We don't know"

The explanation

If we take the conventional level of statistical significance (α) of 0·05 we are saying that there is a 0·05 (5%) probability that a result as extreme as the critical value could occur just by chance, i.e. the probability of a false positive is 0·05.

The explanation is simpler if we look at the other side of this statement: a 0·05 significance level implies there is a 0·95 (95%) probability of concluding "not statistically significant difference" when there is no difference in reality. That is a 0·95 probability of getting a true negative.

The table below shows the probability of getting a false positive with repeated comparisons at a 5% level of significance.

Number of comparisons 1 2 3 4 5 6 7 8 9 10 14
Probability of false positive ·05 ·10 ·14 ·19 ·23 ·27 ·30 ·34 ·37 ·40 ·51

It is clear that the probability of a false positive soon becomes quite large with several comparisons and is over 50% with only fourteen comparisons.


There are several methods for coping with the problem of multiple comparisons, most of them are beyond the scope of this course. One that is worth looking at here is the Bonferroni method.

The Bonferroni method is based on the premise that if we use a smaller level of significance in each comparison this is equivalent to a higher level of significance for the set of tests as a whole.

For example if we are performing five comparisons and we want an overall significance of α = 0·05 then we need to perform each individual test at another (smaller) significance level, α'.

Calculation of α' is quite simple:

α' = α÷k

where k is the number of comparisons

So, in this case α' = 0·05÷5 = 0·01. Thus if any of our five tests had a P value of 0·01 or less the test would be considered significant at the 5% level.

The principle can also be applied in reverse: do the test, multiply the P value obtained by k and if this exceeds the significance level then the null hypothesis can be rejected.

The Bonferroni method works quite well for small numbers of multiple comparisons (say, up to five) but tends to be too conservative for larger numbers of comparisons (i.e. it will tend to miss real differences). You should consult a statistician for advice on other methods of dealing with multiple comparisons.