- From: Martin McLaughlin
- Date: 23 April 1999
*Subject: Which statistical test should I use?*
Can you advise me how to choose which statistical test is the right one to use? |

There are dozens if not hundreds of statistical tests (sometimes known as hypothesis tests or significance tests), some of which are very specialist and quite complicated. We will concentrate here on the ones which are most common at A Level and similar college courses.

The test to use depends on the **kind of data** you have, and the **hypotheses** you have formulated
as part of your experimental design. For each of the tests below, we say what the data should be
like, and give a typical experimental hypothesis.

We are assuming that you are basically happy with how to carry out the tests in the first place.

The data is **categorical** (descriptive, non-numerical). Examples of categorical data are:

- Gender (the data values are "male", "female")
- Value judgement ("very good", "quite good" "adequate", "quite poor", "very poor")
- Risk level ("high", "medium", "low")
- etc

The Chi-squared test establishes whether there is any **association** between the two
categories. For example, if males are more likely to give higher value judgements than
females. The association will be significant if the test statistic exceeds the critical value
(eg if p<0.05). Note that significance of association does not imply cause-and-effect.

Note that your sample size must be large enough to ensure that the expected frequency of each cell in the table is at least 5.

Correlation is often confused with the concept of "association". If the data is categorical (descriptive, non-numercial) then the chi-squared test as described above can be carried out to establish association. However, if the data is numerical, and a pair of numercial variables is collected from each respondent, then the correlation between these variables can be established. Examples of paired numerical data:

- x = height, y = weight
- x = distance travelled, y = time taken
- x = test score in maths, y = test score in physics

The value of the **correlation coefficient** can be calculated. If its value exceeds the
stated critical value in published statistical tables, then the correlation can be considered
significant. Note that significance of correlation does not imply cause-and-effect.

**Pearson's Product-Moment Correlation Coefficient** is the one to use if the x data and
the y data can be assumed to come from an underlying Normal distribution (this is often the
case for naturally occuring continuous numerical data). If this assumption cannot be made, or
if the data is already naturally ranked (known as ordinal data, for example your finishing
position in a race) then **Spearman's Rank Correlation Coefficient** should be used.

There are various types of Student's t Tests, as outlined below. Basically, t tests are
appropriate if you wish to test the significance of the **mean** (average) of a set (or sets)
of numercial data. It is assumed that the data comes from an underlying Normal distribution.
Introductory statistics courses sometimes start with a **Z Test** when testing the
significance of the mean. However, Z tests require "large" sample sizes (typically n>30),
whereas t Tests can be used for smaller sample sizes. It is also the case that for large
sample sizes, t Tests are equivalent to Z Tests. So we just consider t Tests here.

__Single sample t test__

You have a sample set of numerical data. The experimental hypothesis is of the form "the mean of
this variable is different from (greater than, less than) a specified numerical value". If the
test statistic exceeds the critical value (eg if p<0.05), the conculsion is that the mean of the
variable is significantly different from the specified value.

An example of this could be to establish whether the claimed mean life of an electrical
component is justified. You would take a sample of components, and carry out a one-tailed t test to
see whether the mean life of the sample provides evidence for the mean of the whole population
being significanly **less than** the stated mean. If it is, the claim must be rejected.

__Two sample t test__

You have a sample of numerical data from population A, and a sample of similar data from population B. The
two sample t test can establish whether one set has a significantly different (higher, lower) mean
from the other set. If the test statistic exceeds the critical value (eg if p<0.05), the
conculsion is that the mean of the variable from A is significantly different from the mean of the
variable from B.

An example of this could be to establish whether the mean weight of bags of sugar packed by machine A is significantly different from the mean weight packed by machine B.

In the case of the two sample t test, you need to be clear whether you **know the variances**
of set A and set B, or whether you **estimate the pooled variance** of both sets together.
This will influence the form of the actual test statistic you use.

__Paired t test__

As with the two sample t test, this addresses the question of whether the mean of one set is
significantly different from the mean of another set. However, in this case the data values in
sample A are take taken from the same respondents as in sample B. The data values are thus
naturally paired. This makes the calculations easier, since they are based on the
**differences** of each pair of data, which reduces the two data sets to one.

As an example, you could measure the response time of a group of adults **before** they
drunk a fixed amount of alcohol, and the response time of the same adults **after** they
drunk the alcohol. The before (A) and after (B) data are obviously paired. The
paired t test can establish whether one data set has a significantly different (higher, lower)
mean from the other set. If the test statistic exceeds the critical value (eg if p<0.05), the
conculsion is that the mean of the variable from A is significantly different from the mean of
the variable from B.

The Analysis of Variance test is equivalent to a two sample t test, only that it is used if there
are **more than two samples**. It is used to establish whether three or more variables have the same
mean, or whether some of the variables have a significantly different (higher, lower) mean than
others.

There are **one factor** and **two factor** versions of the ANOVA test. The details
and calculations involved are rather complicated when doing ANOVA by hand. A computer software
package is recommended.

This is used in a similar situation to the two sample t test, namely to establish whether set A
is significantly different on average from set B. It is used when we cannot assume that the data
come from an underlying Normal distribution. Examples of this include when the data sets are sets of
**counts** or **rank orderings** rather than measured continuous variables.

The Mann-Whitney U Test is a **non-parametric** test.

Return to contents list