The Genetics of Colour in the Budgerigar and other Parrots
This page created October 1998
Other articles

Home


The Chi-squared Test

What is the Chi-squared Test?

In genetic studies the chi-squared test is used to evaluate a genetic theory or hypothesis by comparing actual breeding results to theoretically expected results. The test is designed to convert the differences (or deviations) between the two into the probability of their occurring by chance, taking into account both the size of the sample and the number of variables (degrees of freedom).

The test is not suitable where the expected frequency within any phenotypic class is less than five (5). And in those cases where each expected frequency is between 5 and 10, or where there are only two (2) classes (one degree of freedom), Yates’ correction (see later) should be applied.

There are two steps to performing the test:

  1. Calculating the chi-squared value from the test result figures using a standard formula.

  2. Comparing the chi-squared value with a scale of values given by a standard probability table to produce a probability value.

Let’s suppose that in our breeding program we paired a Blue bird to a Normal Green/blue. Our genetic knowledge, such as it might be, led us to expect roughly equal numbers of Greens and Blues amongst the young birds produced. In the event the happy pair produced a total of 17 young over quite a productive season; of which 12 are Green and 5 are Blue. That’s not very close to what we expected, and who can blame us for being a bit doubtful about genetic theorising.

That’s where the chi-squared test comes to the rescue. We find that, using our breeding results as sample data, it is unexpectedly easy to perform this test and settle any doubts we might have about inheritance of the blue gene


Calculating the Chi-squared Value

Depending upon the particular breeding experiment there may be two, three, four, or even more classes of progeny. Normally these classes will be distinct visual types (phenotypes), since to test for genetic types (genotypes) would involve identification by a possibly complex system of testmatings.

For each breeding experiment both the numbers observed (o) in each class and the total number of progeny produced, the sum of the separate classes, should be recorded. Additionally it is necessary to calculate, according to the theory or hypothesis being tested, and assuming the same total of progeny, the numbers expected (e) in each class. These latter values are likely to be fractional.

The differences between the expected and observed numbers in each class is known as the deviation (d) and may be a positive or a negative value.

The chi-squared formula is:

(o1 - e1)2 (o2 - e2)2 (on - en)2
Chi-squared (X2)   =     +     +  ......  +  
e1 e2 en


(d1)2 (d2)2 (dn)2
or, chi-squared   =     +     +  ......  +  
e1 e2 en


where   o1 to on   =   numbers observed in (n) classes
e1 to en   =   numbers expected in (n) classes
d1 to dn   =   deviations calculated for (n) classes


At first sight this looks pretty daunting stuff, but if a table such as that below is used and primed with the actual results of our breeding program, it is much easier to understand what is happening.

Chi-squared calculation
Class
or
phenotype
Number
observed
(o)
Number
expected
(e)
Deviation
(d)
(o - e)
d2

(o - e)2
d2
__
e
1 (green) 12 8.5 3.5 12.25 1.44
2  (blue) 5 8.5 -3.5 12.25 1.44
3
4
Totals 17 17 0 - 2.88
Chi-squared value is 2.88


Running along the line representing the Green phenotype it can be seen that: deducting the number expected (8.5) in column 3 from the number observed (12) in column 2 produces a deviation of (3.5) in column 4; in column 5 this is squared (multiplied by itself) to give a figure of 12.25; which, in column 6, is finally divided by the number expected (12.25/8.5) to produce 1.44.

All other lines follow the same pattern although the actual figures will be more varied where there are more than two phenotypes.

Even so, there are a couple of mathematical check points when using the table which will reassure us that we are working correctly:

  1. The total in the observed column should be the same as the total in the expected column and will equal the total number of progeny produced.

  2. In the deviation column, positive figures will cancel out negative figures and produce a total of zero.

and there are two further observations:

  1. Squaring negative deviations converts them to a positive value.

  2. The number of degrees of freedom is one less than the number of phenotypic classes.

The final chi-squared value (2.88 using the sample data) is checked against the appropriate section of a standard probability table as described in the following paragraphs.


Estimating the Probability Value

The probability table usually used with the chi-squared test is that prepared by Fisher and Yates. A version of this table, shortened to enable it to fit this page, is shown below. It is confined to four classes (three degrees of freedom) and has only sufficient probability columns to give an indication of how it is used. However, it does show the critical columns which are used to determine whether the deviation from predicted results is sufficiently great as to throw grave doubt on the theory used for the prediction.

You will find a more complete version of the table in most genetic textbooks or manuals.


Chi-squared distribution
Chi-squared Probability as a percentage
values for 99 95 50 10 5 1 0.01
2 (1) 0.0002 0.004 0.46 2.71 3.84 6.64 10.83
3 (2) 0.0201 0.100 1.39 4.60 5.99 9.21 13.82
4 (3) 0.1150 0.350 2.37 6.25 7.82 11.34 16.27
classes or deviation not significant deviation significant
phenotypes (accept hypothesis) (reject hypothesis)
[Figures in brackets (...) denote degrees of freedom.]


An estimation of the probability value is obtained by looking along the line representing the number of phenotypic classes being considered and comparing the calculated chi-squared value to those in the table. The calculated value will lie between two values corresponding to percentages at the head of the table.

Obtaining a precise value is not the purpose of this test. What is important is to determine whether the deviations from theoretically expected results are of such significance as to invalidate the theory being examined. The significant level is customarily set at 5% so that, for instance, where there are two phenotypic classes, the figure of 3.84 marks the division between deviations which are significant and those which are not significant.

Using the sample data, a chi-squared value of 2.88 is obtained which lies between values indicating probabilities of 5% and 10%; let's say 9%. This indicates that in a large number of similar tests deviations as great as, or greater than that observed, would occur in about 9% of those tests by chance alone. This is greater than the significant 5% level and we can conclude that this test does not invalidate theoretical expectations.

However the nature of this test is such that, in considering breeding behaviour, a corrective procedure should be applied.


Yates’ Correction for Continuity

The distribution table we have been working with is based upon a continuous distribution conforming to the normal curve. Our genetics problems on the other hand involve separate or discrete phenotypic classes having a step-like pattern. Particularly where there are only two phenotypic classes (one degree of freedom), or where the sample is small, this can lead to an under-estimation of the probability value.

To counter this, Yates’ Correction for Continuity is recommended where there are only two phenotypic classes or in small samples where the number of individuals expected in each class is between 5 and 10. The correction deducts 0.5 from the deviation applying to each class before it is squared.

Chi-squared calculation with Yates’ correction
Class
or
phenotype
Number
obs
(o)
Number
exp
(e)
Deviation
(d)
(o - e)
Yates’
correction
(d - 0.5)
New
d2
d2
__
e
1 (green) 12 8.5 3.5 3 9 1.06
2 (blue) 5 8.5 -3.5 -3 9 1.06
3
4
Totals 17 17 0 - - 2.12
Chi-squared value is 2.12


This process is shown in the table immediately above where it yields a corrected chi-squared value of 2.12 corresponding to a probability between 10% and 20%; let’s say 15%. This is higher than the uncorrected result of about 9% and well above the critical level around 5%.

From all this we can conclude quite positively that the breeding results we experienced do not invalidate the generally held theories regarding inheritance of the blue gene.


Summary

  • The methods used in applying the chi-squared test to breeding result figures in order to determine a probability value, and in certain circumstances to produce a value corrected for lack of continuity, have been described.

  • Using sample data from actual breeding results, where 12 Greens and 5 Blues were produced compared to a theoretical expectation of 50% of each phenotype, a probability of about 9% was calculated. As only two phenotypic classes were involved it was prudent to apply a correction for lack of continuity. This produced a modified probability of about 15%.

  • There is a significant difference between the original and corrected figures of 9% and 15% respectively, demonstrating that, around the critical level of 5%, application of the correction might make the difference between accepting or rejecting a hypothesis.

  • It is stressed that the chi-squared test is not in itself definitive and is only one of a number of approaches to the task of interpreting the results of breeding experiments. The cut-off probability value of 5%, though an informed choice, is also quite arbitrary and cannot give a certain indication of the validity, or otherwise, of a theory.


Copyright: Clive Hesford, February 1994 and October 1998

http://ourworld.compuserve.com/homepages/clivehesford/
e-mail: CliveHesford@compuserve.com

Top of Page   •  Other sites index   •  Articles index   •  Books   •  Please take me Home