This site is no longer maintained and has been left for archival purposes

Text and links may be out of date

POISSON DISTRIBUTION

Poisson distribution for count data

Use this test for counts of events that should be randomly distributed in space and time. For example, the number of cells in a certain number of squares in a counting chamber, or the number of colonies growing on agar plates in a dilution plating assay. With this test we can compare such counts and place confidence limits on them.

As background to the Poisson distribution, we should compare the treatment of random count data with the treatment of measurement data. Suppose that we did a survey of the height of postal vans and another survey of the height of postal workers. The mean heights might be very similar (depending on the type of van) but the variance (a measure of the spread of data) would almost certainly be different. People are much more variable than postal vans of a given type. That's obvious. But suppose that we look down on a city where for some reason (perhaps a catastrophe) the postal vans had been adandoned and the postal workers were making their way to their various homes, and we count the number of postal vans in each square kilometre, and do the same for postal workers. Then if the means were the same the variance also would be the same. If counts of anything are randomly distributed in space and time then they follow the Poisson rule:

· the variance is equal to the mean
· so the standard deviation = square root of the mean.

The same point applies if we have a suspension of blood cells in a counting chamber. Provided that these cells do not attract or repel one another their count will conform to Poisson distribution. If there is a mean of 80 cells per square of the counting chamber, then there will be a variance of 80, standard deviation of 8.94 (i.e. Ö 80) and 95% confidence limits of 8.94 x 1.96 (the t value for infinite degrees of freedom). In other words, 95% of squares in the counting chamber would be expected to contain a bacterial count between 62.5 and 97.5.

Note two important points:

1. Provided that the cells are randomly distributed (no mutual attraction or repulsion) then their count conforms to Poisson distribution, and this applies to all the counts (of various types) that ever have been made or that ever will be made. So we need not bother with degrees of freedom - we use the t value for infinite degrees of freedom (actually this is termed a d value).

2. Provided that our count is reasonably high (say, above 30) then it can be treated as part of a Poisson distribution, and we do not even need replicates. So, a count of 30 in one square of a counting chamber (or a count of 80 pooled from, for example, 3 squares) is all we need. This count has:

a mean of 80,

variance (s 2) of 80,

standard deviation (s ) of Ö 80,

standard error (sn) of Ö 80, because sn = s /Ö n and we counted one square (or pooled 3 squares) so n = 1.

An improved estimate of confidence limits of the mean can be obtained by introducing a correction factor. The confidence limits of a count X are calculated as:

X + d2/2 dÖ (X + d2/4), where d is obtained from the bottom of a t-table (p = 0.05).

Thus, for our count of 80, the 95% confidence limits are:

80 + 1.962 /2 1.96 Ö (80 + 1.962 /4) = 81.92 17.64, so the limits are 64.28 to 99.56

Comparison of two Poisson counts

If you are starting to wonder where all that preamble is taking us, suppose that we count 100 cells in a certain volume of bacterial suspension (or blood), and 150 cells in the same volume of another suspension. Are these significantly different?

Call the first count X1 and the second count X2, and use these in the following equation:

 

| X1 - (X1 + X2)/2 | - 0.5

d =

______________________

 

Ö [(X1 + X2)/4]

[We have applied a correction factor of 0.5 here, as in Yates correction for c2, to improve the estimate of d. The symbols "| |" simply mean that the value of 0.5 is subtracted from the value between these two lines regardless of whether the value is positive or negative; a value of +5 becomes +4.5, and a value of -5 becomes -4.5]

If we use our counts of 100 and 150 in the equation above, we get:

  | 150 - (150 + 100)/2 | - 0.5  
d =

__________________________

= 24.5 / 7.9 = 3.10

 

Ö [(150 + 100)/4]

 

We compare this with the d values on the bottom line of a t table and find that it is higher than the d value for p = 0.002. Our two counts are significantly different; there is a probability of only 2 in 1000 of finding this difference by chance.

If the counts were obtained from different volumes (termed V1 and V2) then we simply apply a modified formula:

 

| X1 - (X1 + X2) (V1/(V1 + V2)) | - 0.5

d =

_________________________________________

 

Ö [(X1 + X2) (V1/(V1 + V2)) (V2/(V1 + V2)) ]

Dilution plating

All the methods above can be applied to dilution plating of bacteria or fungi on agar plates. For example, if we used a soil dilution and counted 67 colonies on a plate at the 10-5 dilution, then we can estimate the original population in terms of its mean SE, as 67 8.18 x 105 colony-forming units ml-1 (Ö 67 = 8.18).

A test for randomness in space and time

Sometimes we might wish to test whether counts conform to a Poisson distribution. For example, motile cells can aggregate into clumps, non-motile cells can agglutinate by surface interactions, and cells can also repel one another by producing metabolites. We might wish to test whether these events are occurring, in order to investigate the mechanisms or their biological significance. The method is simple.

Suppose that we incubate cells in a counting chamber for 30 minutes and then count the number of cells in several different squares of the chamber (of course, we can choose the size of our sampling unit by pooling counts for groups of 4 or 16 squares, etc. to get mean counts large enough (say, at least 30) to conform to Poisson expectation).

We might find the following counts in five squares of the chamber: 50, 30, 80, 90, 10.

For these five replicate counts we can obtain a mean (52) and variance in the normal way (see methods) by calculating:

= 4480
Then variance, = 1120

If the data conformed to a Poisson distribution, then the mean of 52 would have a variance of 52. But our calculated variance is 1120. It seems that our counts do not conform to Poisson expectation - the cells are not randomly distributed in the counting chamber.

There are different ways of testing this, which need not be explained, but the simplest is to calculate S d2/mean (= 1120 / 52 = 21.54) and equate this to c2 with n-1 degrees of freedom (n = 5 in our example). From the c2 table we see that our calculated value of 21.54 exceeds the tabulated value of 18.47 for 4 df at p = 0.001. So our counts differ significantly from a Poisson expectation - the cells are not randomly distributed. Instead, we have highly significant evidence that they tend to be aggregated. [We might explain this in terms of agglutination, or if the cells are motile they might release substances that attract other cells.]

[An explanation of what we have done. To test for randomness of distribution, we calculate S d2 which is an estimate of variance of our five replicate values, and we divide it by the mean. If the data fit a Poisson distribution then we will get a value close to 1 for "S d2/mean" (because the mean equals the variance when the data fit a Poisson distribution). Any major clustering (aggregation) of cells etc. will give a "S d2/mean" value much greater than 1. Conversely, if the cells etc. are "too uniformly dispersed" the "S d2/mean" value will be much less than 1.]

Now suppose that we had five counts: 49, 50, 50, 49, 50.

We can calculate the mean (49.6), S d2 (1.2) and S d2/mean (= 0.024). Consulting a c2 table we see that our value of 0.024 is less than the expected value (0.297) for 4 degrees of freedom at p = 0.99. In other words, again our counts do not fit a Poisson expectation - the cells have a significant tendency (99% probability) to be uniformly dispersed. Perhaps they repel one another or perhaps the uniformity is caused by some other factor - that is a question to be addressed by a separate experiment.

Now think about elephants!

Poisson distributions don't apply only to cells or bacterial counts (or postal vans). They apply equally to elephants and animal behaviour. For example, if you surveyed an area of a large game park and counted the elephants in each square kilometre (or whatever area is appropriate), would the data fit a Poisson distribution? Would this be true at all times of the year? The results you obtain would only tell you, in statistical terms, whether the counts fit a Poisson distribution (i.e. whether elephants are randomly distributed in space). But the findings would suggest a lot about the behaviour of elephants. Do they have large family groups? Do these groups disperse at certain times of the year? Of course, what this analysis can never tell us why they behave in this way - do elephants congregate at sites of food abundance, etc. and disperse to forage widely in periods of food shortage? Questions such as those need to be formulated and tested, but at least the statistical analysis of distribution prompts us to ask them.

CONTENTS

INTRODUCTION
TTHE SCIENTIFIC METHOD
Experimental design
Designing experiments with statistics in mind
Common statistical terms
Descriptive statistics: standard deviation, standard error, confidence intervals of mean.

WHAT TEST DO I NEED?

STATISTICAL TESTS:
Student's t-test for comparing the means of two samples
Paired-samples test. (like a t-test, but used when data can be paired)
Analysis of variance for comparing means of three or more samples:

Chi-squared test for categories of data
Poisson distribution for count data
Correlation coefficient and regression analysis for line fitting:

TRANSFORMATION of data: percentages, logarithms, probits and arcsin values

STATISTICAL TABLES:
t (Student's t-test)
F, p = 0.05 (Analysis of Variance)
F, p = 0.01 (Analysis of Variance)
F, p = 0.001 (Analysis of Variance)
c2 (chi squared)
r (correlation coefficient)
Q (Multiple Range test)
Fmax (test for homogeneity of variance)

 

 

 

This site is no longer maintained and has been left for archival purposes

Text and links may be out of date

Accessibility Statement