This site is no longer maintained and has been left for archival purposes
Text and links may be out of date
Chi-squared test for categories of data Background: The Student's t-test and Analysis of Variance are used to analyse measurement data which, in theory, are continuously variable. Between a measurement of, say, 1 mm and 2 mm there is a continuous range from 1.0001 to 1.9999 m m. But in some types of experiment we wish to record how many individuals fall into a particular category, such as blue eyes or brown eyes, motile or non-motile cells, etc. These counts, or enumeration data, are discontinuous (1, 2, 3 etc.) and must be treated differently from continuous data. Often the appropriate test is chi-squared (c2), which we use to test whether the number of individuals in different categories fit a null hypothesis (an expectation of some sort). Chi squared analysis is simple, and valuable for all sorts of things - not just Mendelian crosses! On this page we build from the simplest examples to more complex ones. When you have gone through the examples you should consult the checklist of procedures and potential pitfalls. A simple example Suppose that the ratio of male to female students in the Science Faculty is exactly 1:1, but in the Pharmacology Honours class over the past ten years there have been 80 females and 40 males. Is this a significant departure from expectation? We proceed as follows (but note that we are going to overlook a very important point that we shall deal with later). Set out a table as shown below, with the "observed" numbers and the "expected" numbers (i.e. our null hypothesis). Then subtract each "expected" value from the corresponding "observed" value (O-E) Square the "O-E" values, and divide each by the relevant "expected" value to give (O-E)2/E Add all the (O-E)2/E values and call the total "X2"
Notes: Now we must compare our X2 value with a c2 (chi squared) value in a table of c2 with n-1 degrees of freedom (where n is the number of categories, i.e. 2 in our case - males and females). We have only one degree of freedom (n-1). From the c2 table, we find a "critical value of 3.84 for p = 0.05. If our calculated value of X2 exceeds the critical value of c2 then we have a significant difference from the expectation. In fact, our calculated X2 (13.34) exceeds even the tabulated c2 value (10.83) for p = 0.001. This shows an extreme departure from expectation. It is still possible that we could have got this result by chance - a probability of less than 1 in 1000. But we could be 99.9% confident that some factor leads to a "bias" towards females entering Pharmacology Honours. [Of course, the data don't tell us why this is so - it could be self-selection or any other reason] Now repeat this analysis, but knowing that 33.5% of all students in the Science Faculty are males
Note *1: We know that the expected total must be 120 (the same as the observed total), so we can calculate the expected numbers as 66.5% and 33.5% of this total. Note *2: This total must always be zero. Note *3: Although the observed values must be whole numbers, the expected values can be (and often need to be) decimals. Now, from a c2 table we see that our data do not depart from expectation (the null hypothesis). They agree remarkably well with it and might lead us to suspect that there was some design behind this! In most cases, though, we might get intermediate X2 values, which neither agree strongly nor disagree with expectation. Then we conclude that there is no reason to reject the null hypothesis. Some important points about chi-squared Chi squared is a mathematical distribution with properties that enable us to equate our calculated X2 values to c2 values. The details need not concern us, but we must take account of some limitations so that c2 can be used validly for statistical tests. (i) Yates correction for two categories of data (one degree of freedom) When there are only two categories (e.g. male/female) or, more correctly, when there is only one degree of freedom, the c2 test should not, strictly, be used. There have been various attempts to correct this deficiency, but the simplest is to apply Yates correction to our data. To do this, we simply subtract 0.5 from each calculated value of "O-E", ignoring the sign (plus or minus). In other words, an "O-E" value of +5 becomes +4.5, and an "O-E" value of -5 becomes -4.5. To signify that we are reducing the absolute value, ignoring the sign, we use vertical lines: |O-E|-0.5. Then we continue as usual but with these new (corrected) O-E values: we calculate (with the corrected values) (O-E)2, (O-E)2/E and then sum the (O-E)2/E values to get X2. Yates correction only applies when we have two categories (one degree of freedom). We ignored this point in our first analysis of student numbers (above). So here is the table again, using Yates correction:
In this case, the observed numbers were so different from the expected 1:1 ratio that Yates correction made little difference - it only reduced the X2 value from 13.34 to 12.67. But there would be other cases where Yates correction would make the difference between acceptance or rejection of the null hypothesis. (ii) Limitations on numbers in "expected" categories Again to satisfy the mathematical assumptions underlying c2, the expected values should be relatively large. The following simple rules are applied:
What can we do if our data do not meet these criteria? We can either collect larger samples so that we satisfy the criteria, or we can combine the data for the smaller "expected" categories until their combined expected value is 5 or more, then do a c2 test on the combined data. We will see an example below. Chi squared with three or more categories Suppose that we want to test the results of a Mendelian genetic cross. We start with 2 parents of genotype AABB and aabb (where A and a represent the dominant and recessive alleles of one gene, and B and b represent the dominant and recessive alleles of another gene). We know that all the F1 generation (first generation progeny of these parents) will have genotype AaBb and that their phenotype will display both dominant alleles (e.g. in fruit flies all the F1 generation will have red eyes rather than white eyes, and normal wings rather than stubby wings). This F1 generation will produce 4 types of gamete (AB, Ab, aB and ab), and when we self-cross the F1 generation we will end up with a variety of F2 genotypes (see the table below).
All these genotypes fall into 4 phenotypes, shown by colours in the table: double dominant, single dominant A, single dominant B and double recessive. We know that in classical Mendelian genetics the expected ratio of these phenotypes is 9:3:3:1 Suppose we got observed counts as follows
[Note: *1. From our expected total 80 we can calculate our expected values for categories on the ratio 9:3:3:1.] From a c2 table with 3 df (we have four categories, so 3 df) at p = 0.05, we find that a c2 value of 7.82 is necessary to reject the null hypothesis (expectation of ratio 9:3:3:1). So our data are consistent with the expected ratio. Combining categories Look at the table above. We only just collected enough data to be able to test a 9:3:3:1 expected ratio. If we had only counted 70 (or 79) fruit flies then our lowest expected category would have been less than 1, and we could not have done the test as shown. We would break one of the "rules" for c2 - that no more than one-fifth of expected categories should be less than 5. We could still do the analysis, but only after combining the smaller categories and testing against a different expectation. Here is an illustration of this, assuming that we had used 70 fruit flies and obtained the following observed numbers of phenotypes.
One of our expected categories (ab) is less than 5 (shown in bold italics in the table). So we have combined this category with one of the others and then must analyse the results against an expected ratio of 9:3:4. The numbers in the expected categories were entered by dividing the total (70) in this ratio. Now, with 3 categories we have only 2 degrees of freedom. The rest of the analysis is done as usual, and we still have no reason to reject the null hypothesis. But it is a different null hypothesis: the expected ratio is 9:3:4 (double dominant: single dominant Ab: single dominant aB plus double recessive ab). Chi-squared: double classifications Suppose that we have a population of fungal spores which clearly fall into two size categories, large and small. We incubate these spores on agar and count the number of spores that germinate by producing a single outgrowth or multiple outgrowths. Spores counted:
Is there a significant difference in the way that large and small spores germinate? Procedure: 1. Set out a table as follows
2. Decide on the null hypothesis.
3. Calculate the expected frequencies, based on the null hypothesis.
4. Decide the number of degrees of freedom
5. Run the analysis as usual. Calculating O-E, (O-E)2 and (O-E)2/E for each category, then sum the (O-E)2/E. values to obtain X2 and test this against c2 . The following table shows some of the working. The sum of the values shown in red gives X2 of 20.23
We compare the X2 value with a tabulated c2. with one degree of freedom. Our calculated X2 exceeds the tabulated c2 value (10.83) for p = 0.001. We conclude that there is a highly significant departure from the null hypothesis - we have very strong evidence that large spores and small spores show different germination behaviour. Checklist: procedures and potential pitfalls Chi squared is a very simple test to use. The only potentially difficult things about it are:
If you follow the examples given on this page you should not have too many difficulties. Some points to watch:
|
This site is no longer maintained and has been left for archival purposes
Text and links may be out of date