Estimation and confidence intervals
Objectives
At the end of the lecture and having completed the exercises students should be able to:
- Describe the characteristics of the normal distribution in statistical terms previously encountered
- Explain the concept of a confidence interval and how it relates to an estimated parameter
Bibliography
Gardner, M.J. & D.G. Altman, 1989. Statistics with confidence. The Universities Press, Belfast.
This gives a very thorough introduction to the calculation and use of confidence intervals for those who wish to read further.
Altman, D.G., 1991. Practical Statistics for Medical Research. Chapman & Hall, London. pp. 51-59; 133-143.
These two passages provide more information on the normal distribution and how to test distributions for normality.
The normal distribution
The normal distribution
Sometimes called a Gaussian distribution
Key features
- Symmetric
- Not skewed
- Unimodal
- Described by two parameters
Parameters of normal distribution
The equation of the normal distribution. Do not memorise this!
Explanation of equation
- μ and σ are parameters
- μ is the mean
- σ is the standard deviation
- e and π are constants
z-scores
To transform data from a normally distributed data set to a z-score (standard normal deviate, normal score):
- Subtract the mean
- Divide by the standard deviation
Transforms data to a normal distribution with μ = 0 and σ = 1. This particular normal distribution is known as the standard normal distribution.
The standard normal distribution
The standard normal distribution
The interval
'population mean - 1·96 standard deviations' to 'population mean + 1·96 standard deviations'
Contains 95% of the population
The interval
'population mean - 2·58 standard deviations' to 'population mean + 2·58 standard deviations'
Contains 99% of the population
Why do we use the normal distribution?
- Many biological variables follow a normal distribution
- The normal distribution is well-understood, mathematically
Estimation
What factors will make our estimate of a mean better?
- A large sample size
- A homogenous population (small standard deviation
When we have taken a sample we want to use it to get information about the population from which it is drawn e.g. we want to know the population mean so we take a sample. The sample mean is taken as our best guess at the population mean. The same applies to other parameters - parameters here can mean a feature of a population we are interested in.
We also want to know: how good is the guess?
NB: use Latin letters for sample estimates and Greek letters for population parameters.
Standard error
Standard error is a measure of how good our best guess is.
We have a simple formula to give us the standard error:
standard error = (standard deviation)/(square root of number of observations)
i.e. the bigger the sample, the smaller the standard error
If we repeat the sampling and estimation process many times we get a distribution of estimates.
For a normal population the distribution of estimates of the mean is itself a normal distribution with a standard deviation equal to the standard error.
The standard error is always smaller than the standard deviation
The distribution of sample means will be nearly normal whatever the underlying distribution (the central limit theorem) if the sample sizes are large.
Confidence intervals
The interval:
'sample mean - 1·96 standard errors' to 'sample mean + 1·96 standard errors'
is the 95% confidence interval.
(Using 2·58 instead of 1·96 will give us the 99% confidence interval)
If we take a sample from a population and calculate the mean of that sample and its associated 95% confidence intervals, there is a 95% probability that the confidence interval will include the true mean of the population
A confidence interval gives us a plausible range of values for an estimate, usually on a clinically useful scale.
Confidence intervals can be calculated for other quantities e.g. medians, proportions (see Gardner & Altman)
Example
Number of teeth of nine patients attending a clinic:
32, 32, 31, 30, 28, 28, 27, 25, 18
Mean = 27·9
Standard deviation = 4·4
What is the confidence interval of the mean?
The number of patients is 9. We can use the formula to calculate the standard error:
This becomes:
s.e. = 4·4/3 = 1·467
The 95% confidence interval for the mean is:
'sample mean - 1·96 standard errors' to 'sample mean + 1·96 standard errors'
or:
27·9 - 1·96x1·467 to 27·9 + 1·96x1·467
So, the 95% confidence interval of the mean is:
From 25·0 to 30·8
Note that we do not write 25·0 - 30·8, because the hyphen can become confused with a minus sign in some circumstances always use From ... to .... Also note that the answer was rounded to the appropriate number of decimal places at the final step and not before. This is necessary to avoid errors creeping in.