CHI SQUARED TEST

Chi-squared test for categories of data

Background: The Student's t-test and Analysis of Variance are used to analyse measurement data which, in theory, are continuously variable. Between a measurement of, say, 1 mm and 2 mm there is a continuous range from 1.0001 to 1.9999 m m.

But in some types of experiment we wish to record how many individuals fall into a particular category, such as blue eyes or brown eyes, motile or non-motile cells, etc. These counts, or enumeration data, are discontinuous (1, 2, 3 etc.) and must be treated differently from continuous data. Often the appropriate test is chi-squared (c²), which we use to test whether the number of individuals in different categories fit a null hypothesis (an expectation of some sort).

Chi squared analysis is simple, and valuable for all sorts of things - not just Mendelian crosses! On this page we build from the simplest examples to more complex ones. When you have gone through the examples you should consult the checklist of procedures and potential pitfalls.

A simple example

Suppose that the ratio of male to female students in the Science Faculty is exactly 1:1, but in the Pharmacology Honours class over the past ten years there have been 80 females and 40 males. Is this a significant departure from expectation? We proceed as follows (but note that we are going to overlook a very important point that we shall deal with later).

Set out a table as shown below, with the "observed" numbers and the "expected" numbers (i.e. our null hypothesis).

Then subtract each "expected" value from the corresponding "observed" value (O-E)

Square the "O-E" values, and divide each by the relevant "expected" value to give (O-E)²/E

Add all the (O-E)²/E values and call the total "X²"

	Female	Male	Total
Observed numbers (O)	80	40	120
Expected numbers (E)	60*³	60*³	120 *¹
O - E	20	-20	0 *²
(O-E)^²	400	400
(O-E)^² / E	6.67	6.67	13.34 = X²

Notes:
*¹ This total must always be the same as the observed total
*²This total must always be zero
*³ The null hypothesis was obvious here: we are told that there are equal numbers of males and females in the Science Faculty, so we might expect that there will be equal numbers of males and females in Pharmacology. So we divide our total number of Pharmacology students (120) in a 1:1 ratio to get our ‘expected’ values.

Now we must compare our X² value with a c² (chi squared) value in a table of c² with n-1 degrees of freedom (where n is the number of categories, i.e. 2 in our case - males and females). We have only one degree of freedom (n-1). From the c² table, we find a "critical value of 3.84 for p = 0.05.

If our calculated value of X²exceeds the critical value of c²then we have a significant difference from the expectation. In fact, our calculated X² (13.34) exceeds even the tabulated c² value (10.83) for p = 0.001. This shows an extreme departure from expectation. It is still possible that we could have got this result by chance - a probability of less than 1 in 1000. But we could be 99.9% confident that some factor leads to a "bias" towards females entering Pharmacology Honours. [Of course, the data don't tell us why this is so - it could be self-selection or any other reason]

Now repeat this analysis, but knowing that 33.5% of all students in the Science Faculty are males

	Female	Male	Total
Observed numbers (O)	80	40	120
Expected numbers (E)	79.8*3	40.2	120*¹
O - E	0.2	-0.2	0*²
(O-E)²	0.04	0.04
(O-E)² / E	0.0005	0.001	0.0015 = X²

Note *1: We know that the expected total must be 120 (the same as the observed total), so we can calculate the expected numbers as 66.5% and 33.5% of this total.

Note *2: This total must always be zero.

Note *3: Although the observed values must be whole numbers, the expected values can be (and often need to be) decimals.

Now, from a c² table we see that our data do not depart from expectation (the null hypothesis). They agree remarkably well with it and might lead us to suspect that there was some design behind this! In most cases, though, we might get intermediate X² values, which neither agree strongly nor disagree with expectation. Then we conclude that there is no reason to reject the null hypothesis.

Some important points about chi-squared

Chi squared is a mathematical distribution with properties that enable us to equate our calculated X² values to c²values. The details need not concern us, but we must take account of some limitations so that c²can be used validly for statistical tests.

(i) Yates correction for two categories of data (one degree of freedom)

When there are only two categories (e.g. male/female) or, more correctly, when there is only one degree of freedom, the c²test should not, strictly, be used. There have been various attempts to correct this deficiency, but the simplest is to apply Yates correction to our data. To do this, we simply subtract 0.5 from each calculated value of "O-E", ignoring the sign (plus or minus). In other words, an "O-E" value of +5 becomes +4.5, and an "O-E" value of -5 becomes -4.5. To signify that we are reducing the absolute value, ignoring the sign, we use vertical lines: |O-E|-0.5. Then we continue as usual but with these new (corrected) O-E values: we calculate (with the corrected values) (O-E)², (O-E)²/E and then sum the (O-E)²/E values to get X². Yates correction only applies when we have two categories (one degree of freedom).

We ignored this point in our first analysis of student numbers (above). So here is the table again, using Yates correction:

	Female	Male	Total
Observed numbers (O)	80	40	120
Expected numbers (E)	60*³	60*³	120 *¹
O - E	20	-20	0 *²
\|O-E\|-0.5	19.5	-19.5	0
(\|O-E\|-0.5)^²	380.25	380.25
(\|O-E\|-0.5)^² / E	6.338	6.338	12.676 = X²

In this case, the observed numbers were so different from the expected 1:1 ratio that Yates correction made little difference - it only reduced the X²value from 13.34 to 12.67. But there would be other cases where Yates correction would make the difference between acceptance or rejection of the null hypothesis.

(ii) Limitations on numbers in "expected" categories

Again to satisfy the mathematical assumptions underlying c², the expected values should be relatively large. The following simple rules are applied:

no expected category should be less than 1 (it does not matter what the observed values are)
AND no more than one-fifth of expected categories should be less than 5.

What can we do if our data do not meet these criteria? We can either collect larger samples so that we satisfy the criteria, or we can combine the data for the smaller "expected" categories until their combined expected value is 5 or more, then do a c²test on the combined data. We will see an example below.

Chi squared with three or more categories

Suppose that we want to test the results of a Mendelian genetic cross. We start with 2 parents of genotype AABB and aabb (where A and a represent the dominant and recessive alleles of one gene, and B and b represent the dominant and recessive alleles of another gene).

We know that all the F₁ generation (first generation progeny of these parents) will have genotype AaBb and that their phenotype will display both dominant alleles (e.g. in fruit flies all the F₁ generation will have red eyes rather than white eyes, and normal wings rather than stubby wings).

This F₁ generation will produce 4 types of gamete (AB, Ab, aB and ab), and when we self-cross the F₁ generation we will end up with a variety of F₂ genotypes (see the table below).

Gametes	Gametes
	AB	Ab	aB	ab
AB	AABB	AABb	AaBB	AaBb
Ab	AABb	AAbb	AaBb	Aabb
aB	AaBB	AaBb	aaBB	aaBb
ab	AaBb	Aabb	aaBb	aabb

All these genotypes fall into 4 phenotypes, shown by colours in the table: double dominant, single dominant A, single dominant B and double recessive. We know that in classical Mendelian genetics the expected ratio of these phenotypes is 9:3:3:1

Suppose we got observed counts as follows

	Phenotype
	AB	Ab	aB	ab	Total
Observed numbers (O)	40	20	16	4	80
Expected numbers (E)	45	15	15	5	80*1
O - E	-5	5	1	-1	0
(O-E)²	25	25	1	1
(O-E)² / E	0.56	1.67	0.07	0.20	2.50 = X²

[Note: *¹. From our expected total 80 we can calculate our expected values for categories on the ratio 9:3:3:1.]

From a c²table with 3 df (we have four categories, so 3 df) at p = 0.05, we find that a c²value of 7.82 is necessary to reject the null hypothesis (expectation of ratio 9:3:3:1). So our data are consistent with the expected ratio.

Combining categories

Look at the table above. We only just collected enough data to be able to test a 9:3:3:1 expected ratio. If we had only counted 70 (or 79) fruit flies then our lowest expected category would have been less than 1, and we could not have done the test as shown. We would break one of the "rules" for c²- that no more than one-fifth of expected categories should be less than 5. We could still do the analysis, but only after combining the smaller categories and testing against a different expectation.

Here is an illustration of this, assuming that we had used 70 fruit flies and obtained the following observed numbers of phenotypes.

	Phenotype
	AB	Ab	aB	ab	Combined aB + ab	Total
Observed numbers (O)	34	18	15	3	18	70
Expected numbers (E)	39.375	13.125	13.125	4.375	17.5	70*1
O - E	-5.375	4.875			0.5	0
(O-E)²	28.891	23.766			0.25
(O-E)² / E	0.734	1.811			0.014	2.559 = X²

One of our expected categories (ab) is less than 5 (shown in bold italics in the table). So we have combined this category with one of the others and then must analyse the results against an expected ratio of 9:3:4. The numbers in the expected categories were entered by dividing the total (70) in this ratio.

Now, with 3 categories we have only 2 degrees of freedom. The rest of the analysis is done as usual, and we still have no reason to reject the null hypothesis. But it is a different null hypothesis: the expected ratio is 9:3:4 (double dominant: single dominant Ab: single dominant aB plus double recessive ab).

Chi-squared: double classifications

Suppose that we have a population of fungal spores which clearly fall into two size categories, large and small. We incubate these spores on agar and count the number of spores that germinate by producing a single outgrowth or multiple outgrowths.

Spores counted:

120 large spores, of which 80 form multiple outgrowths and 40 produce single outgrowths
60 small spores, of which 18 form multiple outgrowths and 42 produce single outgrowths

Is there a significant difference in the way that large and small spores germinate?

Procedure:

1. Set out a table as follows

	Large spores	Small spores	Total
Multiple outgrowth	80	18	98
Single outgrowth	40	42	82
Total	120	60	180

2. Decide on the null hypothesis.

In this case there is no "theory" that gives us an obvious null hypothesis. For example, we have no reason to suppose that 55% or 75% or any other percentage of large spores will produce multiple outgrowths. So the most sensible null hypothesis is that both the large and the small spores will behave similarly and that both types of spore will produce 50% multiple outgrowths and 50% single outgrowths. In other words, we will test against a 1:1:1:1 ratio. Then, if our data do not agree with this expectation we will have evidence that spore size affects the type of germination.

3. Calculate the expected frequencies, based on the null hypothesis.

This step is complicated by the fact that we have different numbers of large and small spores, and different numbers of multiple versus single outgrowths. But we can find the expected frequencies (a, b, c and d) by using the grand total (180) and the column and row totals (see table below).

		Large spores	Small spores	Row totals
Multiple outgrowth	Observed (O)	80	18	98
	Expected (E)	a	b	(expected 98)
Single outgrowth	Observed (O)	40	42	82
	Expected (E)	c	d	(expected 82)
	Column totals	120	60	180

To find the expected value "a" we know that a total 98 spores had multiple outgrowths and that 120 of the total 180 spores were large. So a is 98(120/180) = 65.33.

Similarly, to find b we know that 98 spores had multiple outgrowths and that 60 of the total 180 spores were small. So, b is 98(60/180) = 32.67. [Actually, we could have done this simply by subtracting a from the expected 98 row total - the expected total must always be the same as the observed total]

To find c we know that a 82 spores had single outgrowths and that 120 of the total 180 spores were large. So c is 82(120/180) = 54.67.

To find d we know that 82 spores had single outgrowths and that 60 of the total 180 spores were small. So d is 82(60/180) = 27.33. [This value also could have been obtained by subtraction]

4. Decide the number of degrees of freedom

You might think that there are 3 degrees of freedom (because there are 4 categories). But there is actually one degree of freedom! The reason is that we lose one degree of freedom because we have 4 categories, and we lose a further 2 degrees of freedom because we used two pieces of information to construct our null hypothesis - we used a column total and a row total. Once we had used these we would have needed only one data entry in order to fill in the rest of the values (therefore we have one degree of freedom).

Of course, with one degree of freedom we must use Yates correction (subtract 0.5 from each O-E value).

5. Run the analysis as usual. Calculating O-E, (O-E)² and (O-E)²/E for each category, then sum the (O-E)²/E. values to obtain X² and test this against c².

The following table shows some of the working. The sum of the values shown in red gives X²of 20.23

		Large spores	Small spores	Row totals
Multiple outgrowth	Observed (O)	80	18	98
	Expected (E)	65.33	32.67	98
	O-E	+14.67	-14.67
Yates correction	\|O-E\|-0.5	+14.17	-14.17	0
	(O-E_corrected)²/E	3.07	6.14
Single outgrowth	Observed (O)	40	42	82
	Expected (E)	54.67	27.33	82
	O-E	-14.67	+14.67
Yates correction	\|O-E\|-0.5	+14.17	-14.17	0
	(O-E_corrected)²/E	3.67	7.35	X² = 20.23
	Column totals	120	60	180

We compare the X²value with a tabulated c². with one degree of freedom. Our calculated X²exceeds the tabulated c² value (10.83) for p = 0.001. We conclude that there is a highly significant departure from the null hypothesis - we have very strong evidence that large spores and small spores show different germination behaviour.

Checklist: procedures and potential pitfalls

Chi squared is a very simple test to use. The only potentially difficult things about it are:

calculating the expected frequencies when we have double classifications - use the marginal subtotals and totals to work out these frequencies
determining the number of degrees of freedom, especially when we have to use some of the data to construct the null hypothesis.

If you follow the examples given on this page you should not have too many difficulties.

Some points to watch:

Always work with "real numbers" in the observed categories, not with proportions. To illustrate this, consider a simple chi squared test on tossing of coins. Suppose that in 100 throws you get 70 "heads" and 30 "tails". Using Yates correction (for one degree of freedom) you would find an X² value of 15.21, equating to a c²probability less than 0.001. But if you got 7 "heads" and 3 "tails" in a test of 10 throws it would be entirely consistent with random chance. The ratio is the same (7:3), but the actual numbers determine the level of significance in a chi squared test.
Observed categories must have whole numbers, but expected categories can have decimals.
Follow the rules about the minimum numbers in expected categories. These rules do not apply to the observed categories.
Remember Yates correction for one degree of freedom.

CONTENTS

INTRODUCTION
THE SCIENTIFIC METHOD
Experimental design
Designing experiments with statistics in mind
Common statistical terms
Descriptive statistics: standard deviation, standard error, confidence intervals of mean.

WHAT TEST DO I NEED?

STATISTICAL TESTS:
Student's t-test for comparing the means of two samples
Paired-samples test. (like a t-test, but used when data can be paired)
Analysis of variance for comparing means of three or more samples:

Chi-squared test for categories of data
Poisson distribution for count data
Correlation coefficient and regression analysis for line fitting:

TRANSFORMATION of data: percentages, logarithms, probits and arcsin values

STATISTICAL TABLES:
t (Student's t-test)
F, p = 0.05 (Analysis of Variance)
F, p = 0.01 (Analysis of Variance)
F, p = 0.001 (Analysis of Variance)
c2 (chi squared)
r (correlation coefficient)
Q (Multiple Range test)
F_max (test for homogeneity of variance)

This site is no longer maintained and has been left for archival purposes

Text and links may be out of date

This site is no longer maintained and has been left for archival purposes

Text and links may be out of date