Unite Against Fascism brettscaife.net

Estimation and confidence intervals

Objectives

At the end of the lecture and having completed the exercises students should be able to:

  1. Describe the characteristics of the normal distribution in statistical terms previously encountered
  2. Explain the concept of a confidence interval and how it relates to an estimated parameter

Bibliography

Gardner, M.J. & D.G. Altman, 1989. Statistics with confidence. The Universities Press, Belfast.

This gives a very thorough introduction to the calculation and use of confidence intervals for those who wish to read further.

Altman, D.G., 1991. Practical Statistics for Medical Research. Chapman & Hall, London. pp. 51-59; 133-143.

These two passages provide more information on the normal distribution and how to test distributions for normality.

The normal distribution

The normal distribution

The normal distribution

Sometimes called a Gaussian distribution

Key features

Parameters of normal distribution

The normal distribution

The equation of the normal distribution. Do not memorise this!

Explanation of equation

z-scores

To transform data from a normally distributed data set to a z-score (standard normal deviate, normal score):

Transforms data to a normal distribution with mu = 0 and sigma = 1. This particular normal distribution is known as the standard normal distribution.

The standard normal distribution

The standard normal distribution

The standard normal distribution

The interval

'population mean - 1·96 standard deviations' to 'population mean + 1·96 standard deviations'

Contains 95% of the population

The interval

'population mean - 2·58 standard deviations' to 'population mean + 2·58 standard deviations'

Contains 99% of the population

Why do we use the normal distribution?

Estimation

What factors will make our estimate of a mean better?

When we have taken a sample we want to use it to get information about the population from which it is drawn e.g. we want to know the population mean so we take a sample. The sample mean is taken as our best guess at the population mean. The same applies to other parameters - parameters here can mean a feature of a population we are interested in.

We also want to know: how good is the guess?

NB: use Latin letters for sample estimates and Greek letters for population parameters.

Standard error

Standard error is a measure of how good our best guess is.

We have a simple formula to give us the standard error:

s.e. = std. dev. divided by sq. root of sample size

standard error = (standard deviation)/(square root of number of observations)

i.e. the bigger the sample, the smaller the standard error

If we repeat the sampling and estimation process many times we get a distribution of estimates.

For a normal population the distribution of estimates of the mean is itself a normal distribution with a standard deviation equal to the standard error.

The standard error is always smaller than the standard deviation

The distribution of sample means will be nearly normal whatever the underlying distribution (the central limit theorem) if the sample sizes are large.

Confidence intervals

The interval:

'sample mean - 1·96 standard errors' to 'sample mean + 1·96 standard errors'

is the 95% confidence interval.

(Using 2·58 instead of 1·96 will give us the 99% confidence interval)

If we take a sample from a population and calculate the mean of that sample and its associated 95% confidence intervals, there is a 95% probability that the confidence interval will include the true mean of the population

A confidence interval gives us a plausible range of values for an estimate, usually on a clinically useful scale.

Confidence intervals can be calculated for other quantities e.g. medians, proportions (see Gardner & Altman)

Example

Number of teeth of nine patients attending a clinic:

32, 32, 31, 30, 28, 28, 27, 25, 18
Mean = 27·9
Standard deviation = 4·4

What is the confidence interval of the mean?

The number of patients is 9. We can use the formula to calculate the standard error:

s.e. = std. dev. divided by sq. root of sample size

This becomes:

s.e. = 4·4/3 = 1·467

The 95% confidence interval for the mean is:

'sample mean - 1·96 standard errors' to 'sample mean + 1·96 standard errors'

or:

27·9 - 1·96x1·467 to 27·9 + 1·96x1·467

So, the 95% confidence interval of the mean is:

From 25·0 to 30·8

Note that we do not write 25·0 - 30·8, because the hyphen can become confused with a minus sign in some circumstances always use From ... to .... Also note that the answer was rounded to the appropriate number of decimal places at the final step and not before. This is necessary to avoid errors creeping in.