brettscaife.net

Solution to question 4.1

a) Calculating the 95% confidence intervals

The first step in calculating the 95% confidence interval is calculate the standard error. We use the formula:

s.e. = std. dev. divided by sq. root of sample size

Applying the formula in turn to the three sets of sample size and standard deviation gives us the standard errors in the table below.

Children's place of attendance

Number of children

Square root of number of children

Standard deviation of DMFT

Standard error

General Dental Surgery (GDS)

122

11·05

1·71

0·15

Community Dental Surgery (CDS)

34

5·83

1·64

0·28

Non-attenders

21

4·58

1·72

0·38

Note that although the standard deviation of DMFT for children attending GDS is larger than for those attending CDS the standard error for the mean of DMFT for GDS is smaller than that of CDS. This is because although DMFT is more variable for GDS patients (i.e. has a larger standard deviation) the estimate of the mean is more precise (smaller standard error) because the sample size is so much larger.

(I have reported my intermediate results above to two decimal places of accuracy. In fact, I did not round these up whilst I was doing my calculations, I kept them as accurately as my calculator allowed and only rounded up at the final stage.)

Now, the 95% confidence interval is the interval:

From 'sample mean - 1·96 standard errors' to 'sample mean + 1·96 standard errors'

So we calculate 1·96 standard errors for each group of patients and subtract and add this to their respective means.

Children's place of attendance Mean 1·96 standard errors Mean - 1·96 standard errors Mean + 1·96 standard errors
General Dental Surgery (GDS) 1·75 0·30 1·45 2·05
Community Dental Surgery (CDS) 1·71 0·55 1·16 2·26
Non-attenders 21 0·74 0·78 2·26

So we would report our results as follows:

b) What values of DMFT would we expect the GDS population to have

As we are looking a description of the population or sample we need to consider the standard deviation not the standard error.If the population were normally distributed we would expect 95% of the values to lie in the interval:

'Mean - 1·96 standard deviations' to 'Mean + 1·96 standard deviations'

The mean of DMFT for the GDS sample is 1·75 and the standard deviation is 1·71. 1·96 standard deviations is 3·35. This means we would expect 95% of the population to lie in the interval:

From 1·75 - 3·35 to 1·75 + 3·35

Which works out to:

From -1·60 to 5·10

Note that I have rounded to two decimal places, this seemd appropriate given the accuracy with which the original data was reported. Note also that once I have decided to report results to two decimal places I report the second decimal place even when it is zero (I write 5·10 not 5·1).

c) What do the results mean?

My answer to part b) stated that 95% of the GDS population would have DMFT values lying between -1·60 and 5·10. As DMFT is the total number of decayed, missing and filled teeth it is clearly impossible for this to be a negative number. This means that one of the assumptions I made in my calculations must have been wrong. The most likely error is that DMFT is not a normally distributed variable so we would not expect 95% of a population's values to lie in the interval 'Mean - 1·96 standard deviations' to 'Mean + 1·96 standard deviations'.

I would conclude that DMFT is not normally distributed. I would guess that it has a positively skewed distribution as its mean (1·75) is close to its minimum possible value (0). A histogram of the values would probably look something like the sketch below. The long tail to the right is a sign of a positively skewed distribution.

Sketch histogram of DMFT values

If DMFT is not normally distributed then the results from part a) are not so useful. It is mathematically legitimate to calculate the mean for a skewed distribution and the 95% confidence intervals will also be mathematically correct. In terms of clinical interpretation the median is a much more useful and accurate measure of location for skewed distribution. We would have done better to calculate the median with its 95% confidence interval.

Back to questions