Multiple comparisons

Objectives

At the end of the lecture and having completed the exercises students should be able to:

Discuss the problem of multiple comparisons and identify some solutions

Bibliography

Altman, D.G., 1991. Practical statistics for medical research, pp. 210-212

A brief discussion of the problem of multiple comparisons with some suggested solutions.

Bland, J.M & D.G. Altman, 1995. Multiple significance tests: the Bonferroni method. BMJ 310: 170.

A fairly straightforward outline of a method for trying to overcome some of the problems of multiple comparisons.

A problem

Currently a specific drug (A) is used to treat bad breath. Rapid advances in pharmacological research produce twenty new drugs (B to U) that also treat bad breath. All twenty drugs are more expensive to produce so we would only want to use them if they produced significantly better results.

You carry out 20 studies comparing each of the drugs (B to U) in turn with the standard treatment (A). The results of these studies are analysed using t-tests with a significance level of 5%.

Drug H showed a large and clinically important improvement over the standard treatment and the result was statistically significant at the 5% level. Should drug H be used in preference to drug A from now on?

The correct answer is: "We don't know"

The explanation

If we take the conventional level of statistical significance (α) of 0·05 we are saying that there is a 0·05 (5%) probability that a result as extreme as the critical value could occur just by chance, i.e. the probability of a false positive is 0·05.

The explanation is simpler if we look at the other side of this statement: a 0·05 significance level implies there is a 0·95 (95%) probability of concluding "not statistically significant difference" when there is no difference in reality. That is a 0·95 probability of getting a true negative.

So, we compare drug A and drug B with α = 0·05 our chance of a true negative in this test is 0·95. Now, we compare drug A and drug C with α = 0·05, our chance of a true negative in this test is 0·95.
However, if we look at both tests together the probability that neither test will give a significant result when there is no real difference is: 0·95 x 0·95 = 0·90
If we now do a third test to compare drug A and drug D the probability that none of the tests will give a significant result when there are no real differences is: 0·95 x 0·95 x 0·95 = 0·86
If we repeat this procedure twenty times then the probability that none of the tests will give a significant result when there are no real differences is: 0·95²⁰ = 0·36
So there is only a 36% probability of getting a true negative every time. Or, to look at it another way there is a 64% probability of getting at least one "significant" result by chance alone, i.e. a 64% probability of a false positive.

The table below shows the probability of getting a false positive with repeated comparisons at a 5% level of significance.

Number of comparisons	1	2	3	4	5	6	7	8	9	10	14
Probability of false positive	·05	·10	·14	·19	·23	·27	·30	·34	·37	·40	·51

It is clear that the probability of a false positive soon becomes quite large with several comparisons and is over 50% with only fourteen comparisons.

Solutions

There are several methods for coping with the problem of multiple comparisons, most of them are beyond the scope of this course. One that is worth looking at here is the Bonferroni method.

The Bonferroni method is based on the premise that if we use a smaller level of significance in each comparison this is equivalent to a higher level of significance for the set of tests as a whole.

For example if we are performing five comparisons and we want an overall significance of α = 0·05 then we need to perform each individual test at another (smaller) significance level, α'.

Calculation of α' is quite simple:

α' = α÷k

where k is the number of comparisons

So, in this case α' = 0·05÷5 = 0·01. Thus if any of our five tests had a P value of 0·01 or less the test would be considered significant at the 5% level.

The principle can also be applied in reverse: do the test, multiply the P value obtained by k and if this exceeds the significance level then the null hypothesis can be rejected.

The Bonferroni method works quite well for small numbers of multiple comparisons (say, up to five) but tends to be too conservative for larger numbers of comparisons (i.e. it will tend to miss real differences). You should consult a statistician for advice on other methods of dealing with multiple comparisons.

Remember

The best solution to the problem of multiple comparisons is to avoid them if at all possible.
If your study looks like it will involve multiple comparisons look at it again, decide which comparisons are the most important and stick to those ones.
If cost or time constraints mean that multiple comparisons will be inevitable then seek statistical advice as to the best method of dealing with them.
It is difficult to deal with the multiple comparisons problem completely effectively but you must address it somehow if you make multiple comparisons