 # Hypothesis Testing

 From: Martin McLaughlin Date: 23 April 1999 Subject: Which statistical test should I use? Can you advise me how to choose which statistical test is the right one to use?

### Maths Help suggests:

There are dozens if not hundreds of statistical tests (sometimes known as hypothesis tests or significance tests), some of which are very specialist and quite complicated. We will concentrate here on the ones which are most common at A Level and similar college courses.

The test to use depends on the kind of data you have, and the hypotheses you have formulated as part of your experimental design. For each of the tests below, we say what the data should be like, and give a typical experimental hypothesis.

We are assuming that you are basically happy with how to carry out the tests in the first place.

### Chi-squared test based on a contingency table

The data is categorical (descriptive, non-numerical). Examples of categorical data are:

• Gender (the data values are "male", "female")
• Value judgement ("very good", "quite good" "adequate", "quite poor", "very poor")
• Risk level ("high", "medium", "low")
• etc
You will know the frequency (the number of respondents) for each category. If you collect two categorical responses from each respondent (eg their gender and their value judgement) you can cross-reference the responses in a contingency table. Such a table would have two rows ("male", "female") and five columns ("very good", "quite good" "adequate", "quite poor", "very poor"). In each of the ten cells of the table you would show the frequencies of each of the sub-categories.

The Chi-squared test establishes whether there is any association between the two categories. For example, if males are more likely to give higher value judgements than females. The association will be significant if the test statistic exceeds the critical value (eg if p<0.05). Note that significance of association does not imply cause-and-effect.

Note that your sample size must be large enough to ensure that the expected frequency of each cell in the table is at least 5.

### Correlation

Correlation is often confused with the concept of "association". If the data is categorical (descriptive, non-numercial) then the chi-squared test as described above can be carried out to establish association. However, if the data is numerical, and a pair of numercial variables is collected from each respondent, then the correlation between these variables can be established. Examples of paired numerical data:

• x = height, y = weight
• x = distance travelled, y = time taken
• x = test score in maths, y = test score in physics
Such data can be plotted as (x,y) points on a scatter graph.

The value of the correlation coefficient can be calculated. If its value exceeds the stated critical value in published statistical tables, then the correlation can be considered significant. Note that significance of correlation does not imply cause-and-effect.

Pearson's Product-Moment Correlation Coefficient is the one to use if the x data and the y data can be assumed to come from an underlying Normal distribution (this is often the case for naturally occuring continuous numerical data). If this assumption cannot be made, or if the data is already naturally ranked (known as ordinal data, for example your finishing position in a race) then Spearman's Rank Correlation Coefficient should be used.

### t Tests

There are various types of Student's t Tests, as outlined below. Basically, t tests are appropriate if you wish to test the significance of the mean (average) of a set (or sets) of numercial data. It is assumed that the data comes from an underlying Normal distribution. Introductory statistics courses sometimes start with a Z Test when testing the significance of the mean. However, Z tests require "large" sample sizes (typically n>30), whereas t Tests can be used for smaller sample sizes. It is also the case that for large sample sizes, t Tests are equivalent to Z Tests. So we just consider t Tests here.

Single sample t test
You have a sample set of numerical data. The experimental hypothesis is of the form "the mean of this variable is different from (greater than, less than) a specified numerical value". If the test statistic exceeds the critical value (eg if p<0.05), the conculsion is that the mean of the variable is significantly different from the specified value.

An example of this could be to establish whether the claimed mean life of an electrical component is justified. You would take a sample of components, and carry out a one-tailed t test to see whether the mean life of the sample provides evidence for the mean of the whole population being significanly less than the stated mean. If it is, the claim must be rejected.

Two sample t test
You have a sample of numerical data from population A, and a sample of similar data from population B. The two sample t test can establish whether one set has a significantly different (higher, lower) mean from the other set. If the test statistic exceeds the critical value (eg if p<0.05), the conculsion is that the mean of the variable from A is significantly different from the mean of the variable from B.

An example of this could be to establish whether the mean weight of bags of sugar packed by machine A is significantly different from the mean weight packed by machine B.

In the case of the two sample t test, you need to be clear whether you know the variances of set A and set B, or whether you estimate the pooled variance of both sets together. This will influence the form of the actual test statistic you use.

Paired t test
As with the two sample t test, this addresses the question of whether the mean of one set is significantly different from the mean of another set. However, in this case the data values in sample A are take taken from the same respondents as in sample B. The data values are thus naturally paired. This makes the calculations easier, since they are based on the differences of each pair of data, which reduces the two data sets to one.

As an example, you could measure the response time of a group of adults before they drunk a fixed amount of alcohol, and the response time of the same adults after they drunk the alcohol. The before (A) and after (B) data are obviously paired. The paired t test can establish whether one data set has a significantly different (higher, lower) mean from the other set. If the test statistic exceeds the critical value (eg if p<0.05), the conculsion is that the mean of the variable from A is significantly different from the mean of the variable from B.

### ANOVA

The Analysis of Variance test is equivalent to a two sample t test, only that it is used if there are more than two samples. It is used to establish whether three or more variables have the same mean, or whether some of the variables have a significantly different (higher, lower) mean than others.

There are one factor and two factor versions of the ANOVA test. The details and calculations involved are rather complicated when doing ANOVA by hand. A computer software package is recommended.

### Mann-Whitney U Test

This is used in a similar situation to the two sample t test, namely to establish whether set A is significantly different on average from set B. It is used when we cannot assume that the data come from an underlying Normal distribution. Examples of this include when the data sets are sets of counts or rank orderings rather than measured continuous variables.

The Mann-Whitney U Test is a non-parametric test.