TWO-WAY ANOVA

Analysis of variance (ANOVA) for factorial combinations of treatments

Elsewhere on this site we have dealt with ANOVA for simple comparisons of treatments. We can also use ANOVA for combinations of treatments, where two factors (e.g. pH and temperature) are applied in every possible combination. These are called factorial designs, and we can analyse them even if we do not have replicates.

This type of analysis is called TWO-WAY ANOVA.

Suppose that we have grown one bacterium in broth culture at 3 different pH levels at 4 different temperatures. We have 12 flasks in all, but no replicates. Growth was measured by optical density (O.D.).

Construct a table as follows (O.D. is given in fictitious whole numbers here for convenience).

Temp oC	pH 5.5	pH 6.5	pH 7.5
25	10	19	40
30	15	25	45
35	20	30	55
40	15	22	40

Then calculate the following (see the worked example and the output from Microsoft "Excel").

(a) S x, S x², (S x)² / n, and for each column in the table.

(b) S x, S x², (S x)² / n, and for each row.

(c) Find the grand total by adding all S x for columns (it should be the same for rows). Square this grand total and then divide by uv, where u is the number of data entries in each row, and v is number of data entries in each column. Call this value D; in our example it is (336)² 12 = 9408.

(d) Find the sum of S x² values for columns; call this A. It will be the same for S x² of rows. In our example it is 11570.

(e) Find the sum of S x²/n values for columns; call this B. In our example it is 11304.

(f) Find the sum of S x²/n values for rows; call this C. In our example it is 9646.

(g) Set out a table of analysis of variance as follows:

Source of variance	Sum of squares	Degrees of freedom*	Mean square (= S of S df)
Between columns	B - D (1896)	u - 1 (=2)	948
Between rows	C - D (238)	v - 1 (= 3)	79.3
Residual	*** (28)	(u-1)(v-1) (=6)	4.67
Total	A - D (2162)	(uv)-1 (=11)	196.5

[* Where u is the number of data entries in each row, and v is the number of data entries in each column); note that the total df is always one fewer than the total number of entries in the table of data.

*** Obtained by subtracting the between-columns and between-rows sums of squares from total sum of squares.

Now do a variance ratio test to obtain F values:

(1) For between columns (pH): F = Between columns mean square / Residual mean square

= 948 / 4.67 = 203

(2) For between rows (temperature) F = Between rows mean square / Residual mean square

= 79.3 / 4.67 = 17.0

In each case, consult a table of F (p = 0.05 or p = 0.01 or p = 0.001) where u is the between-treatments df (columns or rows, as appropriate) and v is residual df. If the calculated F value exceeds the tabulated value then the treatment effect (temperature or pH) is significant. In our example, for the effect of pH (u is 2 degrees of freedom, v is 6 df) the critical F value at p = 0.05 is 5.14. In fact, we have a significant effect of pH at p = 0.001. For the effect of temperature (u is 3 degrees of freedom, v is 6 df) the critical F value at p = 0.05 is 4.76. We find that the effect of temperature is significant at p = 0.01.

Worked example:

	pH 5.5	pH 6.5	pH 7.5	S x, Rows	n (=u)		S x²	(S x)² / n
25^oC	10	19	40	69	3	23	2061	1587
30^oC	15	25	45	85	3	28.33	2875	2408
35^oC	20	30	55	105	3	35	4325	3675
40^oC	15	22	40	77			2309	1976
S x, Columns	60	96	180	336 grand total				Total C 9646
n (= v)	4	4	4
	15	24.67	46.67
S x²	950	2370	8250				Total A 11570
(S x)² / n	900	2304	8100	Total B 11304

Below, we see a print-out of this analysis from "Excel".

We select Anova: Two-Factor Without Replication from the analysis tools package. Note that the Anova table gives Source of Variation separately for Rows, Columns and Error (= Residual).

	pH 5.5	pH 6.5	pH 7.5
25oC	10	19	40
30oC	15	25	45
35oC	20	30	55
40oC	15	22	40

Anova: Two-Factor Without Replication
SUMMARY	Count	Sum	Average	Variance
Row 1	3	69	23	237
Row 2	3	85	28.33333	233.3333
Row 3	3	105	35	325
Row 4	3	77	25.66667	166.3333

Column 1	4	60	15	16.66667
Column 2	4	96	24	22
Column 3	4	180	45	50
ANOVA
Source of Variation	SS	df	MS	F	P-value	F crit
Rows	238.6667	3	79.55556	17.46341	0.00228	4.757055
Columns	1896	2	948	208.0976	2.87E-06	5.143249
Error	27.33333	6	4.555556
Total	2162	11

Of interest, another piece of information is revealed by this analysis - the effects of temperature do not interact with effects of pH. In other words, a change of temperature does not change the response to pH, and vice-versa.We can deduce this because the residual (error) mean square (MS) is small compared with the mean squares for temperature (columns) or pH (rows). [A low residual mean square tells us that most variation in the data is accounted for by the separate effects of temperature and pH].

But suppose that our data were as follows:

Temp oC	pH 5.5	pH 6.5	pH 7.5
25	10	19	40
30	15	25	30
35	20	30	25
40	25	22	10

Here an increase of temperature increases growth at low pH but decreases growth at high pH. If we analysed these data we would probably find no significant effect of temperature or pH, because these factors interact to influence growth. The residual mean square would be very large. This type of result is not uncommon - for example, patients' age might affect their susceptibility to levels of stress. Inspection of our data strongly suggests that there is interaction. To analyse it, we would need to repeat the experiment with two replicates, then use a slightly more complex analysis of variance to test for (1) separate temperature effects, (2) separate pH effects, and (3) significant effects of interaction.

As an example, below is shown a print-out from "Excel" of the following table, where I have assumed that we did the experiment above with replication.

Temp ^oC	pH 5.5	pH 6.5	pH 7.5
25 rep 1	9	18	36
rep 2	11	20	44
30 rep 1	13	23	27
rep 2	17	27	33
35 rep 1	18	27	23
rep 2	22	33	27
40 rep 1	22	20	7
rep 2	28	24	13

The procedure in "Excel" is as follows.

1. Enter the replicates as separate rows.

2. From the analysis tools menu, choose Anova: Two-Factor with Replication.

3. Insert all the cells of the table in Input range (Anova assumes that column A and

row 1 are used for headings).

4. Enter "2" (in our case) where asked for "Rows per sample".

In the table displayed on the screen (see below) the analysis shows the means for each temperature and each pH. It also tells us the following (see the bottom rows of the table).

(i) There is no significant difference between temperatures overall ("Excel" has called the temperature "Sample") because the calculated F value (3.148) is less than the critical F value (3.49).

(ii) There is very highly significant (p = 0.0008) effect of pH ("Columns") overall.

(iii) There is very highly significant interaction (p = 0.0000397) between temperature and pH. In other words, the response to pH depends on the temperature, or vice-versa. This might have been the purpose of doing the experiment - to see how the organism behaves when subjected to combinations of factors.

Temp	pH 5.5	pH 6.5	pH 7.5
25oC	9	18	36
25oC	11	20	44
30oC	13	23	27
30oC	17	27	33
35oC	18	27	23
35oC	22	33	27
40oC	22	20	7
40oC	28	24	13
Anova: Two-Factor With Replication
SUMMARY	pH 5.5	pH 6.5	pH 7.5	Total
25oC Count	2	2	2	6
Sum	20	38	80	138
Average	10	19	40	69
Variance	2	2	32	36

30oC Count	2	2	2	6
Sum	30	50	60	140
Average	15	25	30	70
Variance	8	8	18	34

35oC Count	2	2	2	6
Sum	40	60	50	150
Average	20	30	25	75
Variance	8	18	8	34

40oC Count	2	2	2	6
Sum	50	44	20	114
Average	25	22	10	57
Variance	18	8	18	44

Total Count	8	8	8
Sum	140	192	210
Average	70	96	105
Variance	36	36	76

ANOVA
Source	SS	df	MS	F	P-value	F crit
Sample	116.5	3	38.83333	3.148649	0.064794	3.4903
Columns	330.3333	2	165.1667	13.39189	0.000877	3.88529
Interaction	1203	6	200.5	16.25676	3.97E-05	2.996117
Within	148	12	12.33333
Total	1797.833	23

Note: Because there is so much interaction, it is difficult to analyse the separate effects of temperature and pH. We should repeat the analysis, using separate parts of the data. For example, ANOVA for all the pH treatments at 25^oC , then at 30^oC, then 35^oC and 40^oC. But we could assemble all the means (there are 12) in ranked order and do a multiple range test to find significant differences.

CONTENTS

INTRODUCTION
THE SCIENTIFIC METHOD
Experimental design
Designing experiments with statistics in mind
Common statistical terms
Descriptive statistics: standard deviation, standard error, confidence intervals of mean.

WHAT TEST DO I NEED?

STATISTICAL TESTS:
Student's t-test for comparing the means of two samples
Paired-samples test. (like a t-test, but used when data can be paired)
Analysis of variance for comparing means of three or more samples:

Chi-squared test for categories of data
Poisson distribution for count data
Correlation coefficient and regression analysis for line fitting:

TRANSFORMATION of data: percentages, logarithms, probits and arcsin values

STATISTICAL TABLES:
t (Student's t-test)
F, p = 0.05 (Analysis of Variance)
F, p = 0.01 (Analysis of Variance)
F, p = 0.001 (Analysis of Variance)
c2 (chi squared)
r (correlation coefficient)
Q (Multiple Range test)
F_max (test for homogeneity of variance)

This site is no longer maintained and has been left for archival purposes

Text and links may be out of date

This site is no longer maintained and has been left for archival purposes

Text and links may be out of date