brettscaife.net

Solution to question 3.4

Plaque scores of 40 children entering a toothbrush study

Mode

As the mode is defined as the most common value it is both 2·2 and 2·9 for this data set; there are five occurences of each of them (marked in blue above. The data set is bimodal.

Median

The median is middle value. As there are an even number of observations we will need to interpolate between the two central values. As there are 40 observations the central ones are 20 and 21 (circled in green above). Both of these are 2·9 so the median is half of (2·9 + 2·9), or 2·9.

Mean

We get the mean by adding all the observations together (giving 112·0) and dividing by the number of observations (40). This gives us 2·8.

Box and whisker plot

The first thing we need to do to draw a box and whisker plot is calculatte the interquartile range (IQR) to enable us to draw the box. The first quartile (Q1) and third quartlie (Q3) are found by looking at the value of the observations given in the following formulae (where n is the number of observations:

Q1 = 1/4(n+1) Q3 = 3/4(n+1)

Using n = 40 gives us 10·25 for Q1; our value is a quarter (0·25) of the way from the 10th to the 11th observation (orange values in the table below). Fortunately these are identical so our value for Q1 is 2·3. The formula above gives us 30·75 for Q3; our value is a three-quarters (0·75) of the way from the 30th to the 31st observation (pink values in the table below). Once again these are identical so our value for Q3 is 3·1.

Plaque scores of 40 children entering a toothbrush study

We now need to decide where to put the whiskers. Now, the whiskers include all the data except outliers. We define outliers to be values more than 1·5 (IQR) box lengths from the edges of the box. The length of the box is:

3·1 - 2·3 = 0·8

So 1·5 box lengths is 1·2. This means that low outliers are observations with values lower than 1·1 (2·3 - 1·2 = 1·1, i.e. Q1 - 1·2 = 1·1). Inspecting the data we see that there are no values lower than this. Consequently our lower whisker runs to the value of the lowest observation, 1·9. High outliers are observations with values higher than 4·3 (3·1 + 1·2 = 4·3, i.e Q3 + 1.2 = 4·3). Inspecting the data we see that there are no values higher than this. Consequently our upper whisker runs to the value of the highest observation, 4·1.

The final element of the box and whisker plot, the median, has already been calculated above so we can assemble all this information and draw the plot below.

Box and whisker plot of the plaque scores of 40 children

Note that the plot has a title, a scale and mentions the sample size.

The box and whisker plot indicates that the data set is slightly positively skewed - it has a longer tail to the right. The most appropriate measure of location for a skewed metric data set is the median (value 2·9). The most appropriate measure of spread for such a data set is the interquartile range, which we have already determined runs from 2·3 to 3·1.

Back to questions