Contingency Tables – Fisher’s Exact Test

March 6th, 2010

A contingency table is used in statistics to provide a tabular summary of categorical data and the cells in the table are the number of occassions that a particular combination of variables occur together in a set of data. The relationship between variables in a contingency table are often investigated using Chi-squared tests.

The simplest contingency table with two variables has two levels for each of the variables. Consider a trial comparing the performance of two challengers. Each of the challengers undertook the trial eight times and the number of successful trials was recorded. The hypothesis under investigation in this experiment is that the performance of the two challengers is similar. If the first challenger was only successful on one trial and the second challenger was successful on four of the eight trials then can we discriminate between their peformance?

The function fisher.test is used to perform Fisher’s exact test when the sample size is small to avoid using an approximation that is known to be unrealiable for sample samples. The data is setup in a matrix:

challenge.df = matrix(c(1,4,7,4), nrow = 2)

The function is then called using this data to produce the test summary information:

> fisher.test(challenge.df)
        Fisher's Exact Test for Count Data
data:  challenge.df 
p-value = 0.2821
alternative hypothesis: true odds ratio is not equal to 1 
95 percent confidence interval:
 0.002553456 2.416009239 
sample estimates:
odds ratio 

The p-value calculated for the test does not provide any evidence against the assumption of independence. In this example this means that we cannot confidently claim any difference in performance for the two challengers.

11 responses to “Contingency Tables – Fisher’s Exact Test”

  1. Tal Galili says:

    Thanks for the post,

    I would also like to point that there is a more powerful alternative then fisher exact test, which is Barnard’s exact test:'s_test

    A nice person shared with me an R code for implementing the test, and I republished it here:

    (p.s: consider adding the plugin “subscribe to comments” to your blog. 🙂 )


  2. Marc Schwartz says:


    FYI, the Fisher Exact test has been shown to be overly conservative, as is the Yates Correction to the Chi-Square, which is intended to emulate the former.

    Articles which discuss this are:

    Campbell I, Chi-squared and Fisher-Irwin tests of two-by-two tables with small sample recommendations, Stat Med 2007, 26, 3661 – 3675


    Crans GG, Shuster JJ, How conservative is Fisher’s exact test? A quantitative evaluation of the two-sample comparative binomial trial. Stat Med. 2008 27:3598-611.

    Campbell has a web site with the “N-1” chi-square implemented here:

    Using that test with your data, the resultant p value is 0.12, which is less than half that of the FET.

    The p value from a Yates corrected chi-square is 0.2807, which is close to the FET, while from an uncorrected chi-square it is 0.1056, which is closer to the N-1 value, but may have an inflated Type I error.

    Frank Harrell also has a page which discusses this (near bottom of the page):



  3. Ralph says:

    Thanks for the comments guys. I will take a look at those papers/documents if I can get my hands on them to see what the reasoning behind the identified shortcomings of FET and what difference it makes in practice for this and some other examples by calculating the proposed alternative test statistics.

    I should also have mentioned in my post that the Fisher test is for pre-planned analysis only.

    Many thanks,

  4. Marc Schwartz says:


    The Campbell paper is available for download from his site that I listed above. At least the final pre-print version. The Crans/Shuster paper would need to be obtained from a library or from the StatMed web site for a pay per view fee.

    The practical importance for your example on a post hoc basis is none, other than providing a comparison on the same data.

    However, if one is prospectively designing a study and is conducting power/sample size calculations or simulations, the FET, being conservative, will require more subjects than a test that has greater power, while maintaining the desired nominal alpha.

    As a consequence, for example, if you are conducting a human clinical trial, using the FET, you will expose more subjects to the risks of the study than need be and you also increase, perhaps significantly, the costs of the study as a result.

    Those are of course, not trivial implications.



  5. Ralph says:


    Thanks for highlighting the important ethical aspect of small trials (clinical in particular) where it is important to ensure that the most appropriate statistical analysis and calculations are used to determine sample sizes. When I find an R function to run the alternative tests then I am considering creating a two-way table of all the pairs of outcomes (not too difficult for this example) to compare the improvement in detectable difference for the various methods. I think that this will help provide me with a further understanding of the various techniques.

    I got hold of the pre-print of the Campbell paper from your link and should hopefully be able to get hold of the other paper through work.

    Thanks again for your comments,

  6. Gaspard Houser says:

    Marc Schwartz, I think you’re making a terrible mistake. Fisher test is too conservative if you misuse it, that is if the experience ain’t planned.

    Correctly use, it is *exact*.

  7. KGena says:

    Small question, as I was reading about the Fishers Test, that it requires two variables A and B, each having at least two sub categories. One variable would be in columns and other in rows. For example here :

    In the example on this website, two challengers are selected and discrimination on their performance is compared. Basically, the results are organised in rows and independence is questioned. Right ?

    P value is calculated and conclusions are drawn : no performance difference.

    BUT, if I put the same values here :
    the P is the same, but it says : The association between rows (groups) and columns (outcomes) is considered to be not statistically significant.

    1) Why does it say association between rows and columns ? The challenge was to check an association between two rows, right ?

    2) If one variable is rows and the other is columns, as here :, how and why the data can be organised in rows ?

    I am so confused 🙁 please help.

    thank you,

  8. Ralph says:

    I think that this might be the case of people using different terminology to describe the same thing that is causing you confusion. The convention, I believe, is to use the rows for challengers (in this example) and to investigate whether the counts in the columns follow the same pattern for the challengers. While the question of interest is to compare the rows, this is done by looking for association between the rows and columns, i.e. do the counts in the two columns depend on the row of the table?

    Hope this helps rather than making things more confusing!

  9. Sunny says:

    what would you use for multidimensional factors and multi-levels?

  10. Fred says:

    Hi everyone,

    if I am correctly informed the FET should produce a z-statistic along with the p-value. Is it somehow possible to obtain that z-value somehow from the fisher.test()-function? In my field of research you always need to report the test-statistics along with your p-values.

    Help would be much appreciated.

  11. Ralph says:


    The FET output discusses the odds ratio and the last line has the sample estimate for the odds ratio. Is this what you are looking for?