| STATISTICAL TESTS FOR
        SIGNIFICANCE What test do I need? Other parts of this site explain how to do the common
        statistical tests. Here is a guide to choosing the right
        test for your purposes. When you have found it, click on
        "more information?" to confirm
        that the test is suitable. If you know it is suitable,
        click on "go for it!" Important: Your
        data might not be in a suitable form (e.g. percentages,
        proportions) for the test you need. You can overcome this
        by using a simple transformation. Always check
        this - click HERE. 1. Student's t-test
         
            Use this test for comparing the means of
            two samples (but see test 2 below), even
            if they have different numbers of replicates.
            For example, you might want to compare the growth
            (biomass, etc.) of two populations of bacteria or
            plants, the yield of a crop with or without
            fertiliser treatment, the optical density of samples
            taken from each of two types of solution, etc. This
            test is used for "measurement data" that
            are continuously variable (with no fixed limits), not
            for counts of 1, 2, 3 etc. You would need to transform percentages
            and proportions because these have fixed limits
            (0-100, or 0-1). More information?Go for it!
 2.
        Paired-samples test  
            Use this test like the t-test
            but in special circumstances - when you can arrange
            the two sets of replicate data in pairs. For
            example: (1) in a crop trial, use the
            "plus" and "minus" nitrogen crops
            on one farm as a pair, the "plus" and
            "minus" nitrogen crops on a second farm as
            a pair, and so on; (2) in a drug trial where a drug
            treatment is compared with a placebo (no treatment),
            one pair might be 20-year-old Caucasian males,
            another pair might be 30-year old Asian females, and
            so on. More information?Go for it!
 3.
        Analysis of variance for comparing the means of three or
        more samples  
            Use this test if you want to compare
            several treatments. For example, the growth
            of one bacterium at different temperatures, the
            effects of several drugs or antibiotics, the sizes of
            several types of plant (or animals' teeth, etc.). You
            can also compare two things simultaneously - for
            example, the growth of 3 bacteria at different
            temperatures, and so on. Like the t-test,
            this test is used for "measurement data"
            that are continuously variable (with no fixed
            limits), not for counts of 1, 2, 3 etc. You would
            need to transform
            percentages and proportions because these have
            fixed limits (0-100, or 0-1). More information?
            You need this, because there are different
            forms of this test. 4.
        Chi-squared test for categories of data  
            Use this test to compare counts (numbers)
            of things that fall into different categories.
            For example, the numbers of blue-eyed and brown-eyed
            people in a class, or the numbers of progeny (AA, Aa,
            aa) from a genetic crossing experiment. You can also use
            the test for combinations of factors (e.g.
            the incidence of blue/brown eyes in people with
            light/dark hair, or the numbers of oak and birch
            trees with or without a particular type of toadstool
            beneath them on different soil types, etc.). More information?Go for it!
 5.
        Poisson distribution for count data 
            Use this test for putting confidence
            limits on the mean of counts of random events,
            so that different count means can be compared
            for statistical difference. For example, numbers of
            bacteria counted in the different squares of a
            counting chamber (haemocytometer) should follow a
            random distribution, unless the bacteria attract one
            another (in which case the numbers in some squares
            should be abnormally high, and abnormally low in
            other squares) or repel one another (in which case
            the counts should be abnormally similar in all
            squares). Very few things in nature are randomly
            distributed, but testing the recorded data against
            the expectation of the Poisson distribution would
            show this. By using the Poisson distribution you have
            a powerful test for analysing whether objects/ events
            are randomly distributed in space and time (or,
            conversely, whether the objects/ events are
            clustered). 
            More information?Go
            for it!
 6.
        Correlation coefficient and regression analysis for curve
        fitting  
            These procedures are used for looking at
            the relationship between different factors,
            and (if appropriate) for graphing the results
            in statistically meaningful ways. For
            example, as the temperature (or pH, etc.) increases,
            does growth rate increase or decrease? As the dose
            rate of a drug is increased does the response rate of
            patients rise? As altitude is increased does the
            number of butterflies (or oak trees) increase or
            decrease? Sometimes the relationship is
            linear, sometimes logarithmic, sometimes sigmoidal,
            etc. You can test all these possibilities and, in
            drug or toxicity trials (for example) calculate the
            LD50 or ED50 (lethal dose, or
            estimated dose, for a 50% response rate). More
            information?Go for
            it!
 ========================================== More information Student's t-test
         
            Use this test for comparing the means of
            two populations that you have sampled (but see test 2
            below). For example, you might want to
            compare the growth (biomass, etc.) of two bacteria or
            plants, the yield of a crop with or without added
            nitrogen, the optical density of samples taken from
            each of two types of solution, etc. What you will need for this test:
            a minimum of 2 or 3 replicates of each sample or
            treatment, but ideally at least 5 replicates. For
            example, the yield measured for 5 fields of a crop
            fertilised with nitrogen and for 5 unfertilised
            fields, the optical density of 5 tubes of each
            solution, the measurement of 5 plants of each type,
            etc. Large sample sizes (10 or more) are always
            better than small sample sizes, but it is easier to
            measure the height of 10 or 20 (or 50) plants than it
            is to set up10 or 20 large-scale fermenters! You don't need the same number of
            replicates of each treatment - for example, you can
            compare 3 tubes of one solution with 4 tubes of
            another. You could also use this test to compare
            several replicates of one treatment with a single
            value for another treatment, but it would not be very
            sensitive. Go for it!Back to "What
            test do I need?"
 Paired-samples
        test  
            Use this test like the t-test
            but in special circumstances - when you can arrange
            the two sets of replicate data in pairs. For
            example: (1) in a crop trial, use the
            "plus" and "minus" nitrogen crops
            on one farm as a pair, the "plus" and
            "minus" nitrogen crops on a second farm as
            a pair, and so on; (2) in a drug trial where a drug
            treatment is compared with a placebo (no treatment),
            one pair might be 20-year-old males, another pair
            might be 30-year old females, and so on. Why do we use the paired samples test?
            Because farms or people or many other things are
            inherently variable, but by pairing the treatments we
            can remove much of this random variability from the
            test of "nitrogen versus no nitrogen" or
            "drug treatment versus no treatment", etc.  What are the requirements for this test?
            The main requirement is that the experiment is
            PLANNED ahead of time. Then you can use the paired
            samples test for many purposes - for example, two
            treatments compared on one day, then the same two
            treatments compared on the next day, and so on.  In general, you will need more replicates
            than for a t-test
            (say, a minimum of 5 for each treatment),
            and you will need the same number of
            replicates for each treatment. But you must have a good reason
            to pair treatments - you should not do it
            arbitrarily. Go for
            it!Back to "What
            test do I need?"
 Analysis of
        variance for comparing the means of
        three or more samples.  
            Use this test if you want to compare
            several treatments. For example, the growth
            of one bacterium at different temperatures, the
            effects of several drugs or antibiotics, the sizes of
            several plants (or animals' teeth, etc.). You can
            also compare two things simultaneously - for example,
            the growth of 3 or 4 strains of bacteria at different
            temperatures, and so on. The simplest form of this test is one-way
            ANOVA (ANalysis Of VAriance). Use
            this to compare several separate treatments
            (e.g. effects of 3 or more temperatures, antibiotic
            levels, crop treatments, etc.). You will need
            at least 2 replicates of each treatment.  One-way ANOVA tells you if there are differences
            between the treatments as a whole. But it can
            also be used, with caution, like a multiple t-test,
            to tell you which of the treatments differ from each
            other.Go for
            one-way ANOVA?
 Back
            to "What test do I need?"
 Another form of this test is two-way ANOVA.
            Use this if you want to compare combinations
            of treatments. For example, to compare the
            growth of an organism on several different substrates
            at several different temperatures. Or the effects of
            two (or more) drugs singly and in combination. Or
            responses of crops to fertiliser treatment on
            different farms or soil types. You can get
            useful information even if you have one of each
            combination of treatments, but you get much
            more information if you have 2 (or more) replicates
            of each combination of treatments. Then the test can
            tell you if you have significant interaction
            - for example, if changing the temperature changes
            the way that an organism responds to a change of pH,
            etc.Go
            for two-way ANOVA?
 Back
            to "What test do I need?"
 Chi-squared test for
        categories of data  
            Use this test to compare counts (numbers)
            of things that fall into different categories.
            For example, to compare the numbers of blue-eyed and
            brown-eyed people in a class, or the numbers of
            progeny (AA, Aa, aa) from a genetic crossing
            experiment. You can also use the test for
            looking at combinations of factors (e.g. the
            incidence of blue/brown eyes in people with
            light/dark hair, or the numbers of toadstools beneath
            oak and birch trees on different soil types, etc.). For this test you compare the actual
            counts (in the different categories) with
            an "expected" set of counts.
            Sometimes the expectation is obvious - for example,
            that half of the progeny from a cross between parents
            Aa and aa will have the Aa genotype and half will
            have aa. You have to construct an hypothesis (termed
            the null hypothesis) by using logical arguments. What are the requirements for this test?
            Almost any sort of "count" data can be
            analysed by chi-squared, but you have to use
            "real" numbers, not proportions or
            percentages. Go for it!Back
            to "What test do I need?"
 Poisson
        distribution for count data 
            The main requirement for this test is that the
            mean count (of bacterial colonies, buttercups, etc.)
            need to be relatively high (say 30 or more) before
            they can be expected to conform to a Poisson
            distribution. If you have such a high count, then you
            can test whether or not your results actually do
            conform to the Poisson distribution. Go
            for it!Back
            to "What test do I need?"
 Correlation
        coefficient and regression analysis
        for curve fitting  
            These procedures are used for looking at
            the relationship between different factors,
            and (if appropriate) for graphing the results
            in statistically meaningful ways. For
            example, as the temperature (or pH, etc.) increases,
            does growth rate increase or decrease? As the dose
            rate of a drug is increased does the response rate of
            patients rise? As altitude is increased does the
            number of butterflies (or oak trees) increase or
            decrease? Sometimes the relationship is
            linear, sometimes logarithmic, sometimes sigmoidal,
            etc. You can test all these possibilities and, in
            drug or toxicity trials (for example) calculate the
            LD50 or ED50 (lethal dose, or
            estimated dose, for a 50% response rate). There is a 3-stage procedure: 
                Plot your results on graph paper, and ask
                    yourself: does the relationship look (or is
                    expected to be) linear, or is it logarithmic,
                    or sigmoid (S-shaped)? You might need to
                    transform the data (see transforming
                    data) if they are not linear.Calculate the correlation coefficient, which
                    tells you whether the data fit a straight
                    line relationship (and how close the fit is,
                    in statistical terms).If the correlation coefficient is
                    significant, and other conditions are met,
                    proceed to regression analysis, which gives
                    the equation for the line of best fit, then
                    draw this line on your graph.  Go
            for it!Back
            to "What test do I need?"
 Transformation of data 1. Proportions and percentages:
        convert to arcsin values Certain mathematical assumptions underly all the
        statistical tests on this site. The most important
        assumption is that the data are normally distributed and
        are free to vary widely about the mean - there are no
        imposed limits. Clearly this is not true of percentages,
        which cannot be less than 0 nor more than 100. If you
        have data that are close to these limits, then you need
        to transform the original data before you analyse them. One simple way of doing this is to convert the
        percentages to arcsin values and then analyse
        these arcsin values. The arcsin transformation moves very
        low or very high values towards the centre, giving them
        more theoretical freedom to vary.  [You convert percentages (x) to arcsin values
        ( q ), where q is an angle for which sin
        q is Ö x/100 ] On a calculator:  
            to get the arcsin value for a percentage
            (e.g. 50%), divide this by 100 ( = 0.5), take the
            square root (= 0.7071), then press "sin-1"
            to get the arcsin value (= 45). [NB: if your
            calculator gives the result as 0.785 then this is the
            angle in radians rather than degrees] to get the arcsin value for a proportion
            (e.g. 0.4), take the square root (= 0.6325), then
            press "sin-1" to get the arcsin value (=
            39.23). On an "Excel" spreadsheet: 
            convert percentages to arcsin values (and back
            again) by entering a formula into the spreadsheet - Go for
            it!  2. Logarithmic transformation Use this for two purposes: 
            When fitting a curve to logarithmic data
                (exponential growth of cells, etc). Take the
                logarithm of each "growth" value and
                plot this against time (real values). You can use
                either natural logarithms or logs to base 10. The
                data should now show a straight-line relationship
                and can be analysed using correlation coefficient
                and regression. In Analysis of Variance, when
                comparing means that differ widely. The reason
                for this is that an analysis of variance is based
                on the assumption that the variance is the same
                across all the data. But usually this will not be
                true if some means are very small and others are
                very large - the individual data points for the
                large mean could vary widely. [For example, a
                mean of 500 could be made up from 3 values of
                100, 400 and 1000, whereas a mean of 50 could not
                possibly include such wide variation] This
                problem is overcome by converting the original
                data to logarithms, squeezing all the data points
                closer together. Contrary to expectations, this
                would show significant differences between small
                and large means that would not
                be seen otherwise. 3. Converting Percentages to
        Probits Some types of data show a sigmoid (S-shaped)
        relationship. A classic case is in dosage-response
        curves, for testing antibiotics, pharmaceuticals, etc. To
        analyse these relationships the "percentage of
        patients/cells responding to a treatment" can be
        converted to a "probit" value, and the dosage
        is converted to a logarithm. This procedure converts an
        S-shaped curve into a straight-line relationship, which
        can be analysed by correlation coefficient and regression
        analysis in the normal way. From the straight-line
        equation, we can calculate the LD50, ED50,
        and so on.  The method for doing this in "Excel" is
        shown below. Converting between percentage, arcsin and probits in Excel. The table below shows part of a page from an
        Excel worksheet. Columns are headed A-F and
        rows are labelled 1-21, so each cell in the table can be
        identified (e.g. B2 or F11). Representative % values were
        inserted in cells B2-B21.  You will now see how to convert these % values into
        probits or arcsin values, and back again. If you do the
        relevant conversion in your own spreadsheet, you can then
        use the probit or arcsin values instead of % values for
        the statistical tests. In cell C2 of the spreadsheet. a formula was entered
        to convert Percentage to Probit values. The formula (without spaces) is: =NORMINV(B2/100,5,1) This formula is not seen. As soon as we move out of
        cell C2 it automatically gives the probit value (in C2)
        for the percentage in cell B2, seen in the
        "printout" below. Copying and then pasting this
        formula into every other cell of column C produces a
        corresponding probit value (e.g. cell C3 contains the
        probit of the % in cell B3). Next, a formula was entered in cell D2 to convert Probit
        to Percentage, and the above procedure was repeated
        for all cells in column D. The formula is: =NORMDIST(C2,5,1,TRUE)*100 The formula entered in cell E2 converts Percentage
        to Arcsin  The formula is: =ASIN(SQRT(A2/100))*180/PI() The formula in cell F2 converts Arcsin to
        Percentage The formula is: =SIN(E2/180*PI())^2*100 
            
                | A | B | C | D | E | F |  
                | 1 | Percent | % to Probit | Probit to % | % to arcsin | arcsin to % |  
                | 2 | 0.1 | 1.91 | 0.1 | 1.812 | 0.1 |  
                | 3 | 0.5 | 2.424 | 0.5 | 4.055 | 0.5 |  
                | 4 | 1 | 2.674 | 1 | 5.739 | 1 |  
                | 5 | 2 | 2.946 | 2 | 8.13 | 2 |  
                | 6 | 3 | 3.119 | 3 | 9.974 | 3 |  
                | 7 | 4 | 3.249 | 4 | 11.54 | 4 |  
                | 8 | 5 | 3.355 | 5 | 12.92 | 5 |  
                | 9 | 6 | 3.445 | 6 | 14.18 | 6 |  
                | 10 | 7 | 3.524 | 7 | 15.34 | 7 |  
                | 11 | 8 | 3.595 | 8 | 16.43 | 8 |  
                | 12 | 9 | 3.659 | 9 | 17.46 | 9 |  
                | 13 | 10 | 3.718 | 10 | 18.43 | 10 |  
                | 14 | 50 | 5 | 50 | 45 | 50 |  
                | 15 | 96 | 6.751 | 96 | 78.46 | 96 |  
                | 16 | 97 | 6.881 | 97 | 80.03 | 97 |  
                | 17 | 98 | 7.054 | 98 | 81.87 | 98 |  
                | 18 | 99.5 | 7.576 | 99.5 | 85.95 | 99.5 |  
                | 19 | 99.99 | 8.719 | 99.99 | 89.43 | 99.99 |  
                | 20 | 99.999 | 9.265 | 99.999 | 89.82 | 99.999 |  
                | 21 | 99.9999 | 9.768 | 100 | 89.94 | 99.9999 |  |