Cross-tabulation of Data

May 15th, 2009

The contingency table is used to summarise data when there are factors in the data set and we are interested in counting the number of occurrences of each combination of factor variables. In R there are different ways that these types of table can be produced and manipulated as required.

Fast Tube by Casper

The main two functions that are used to produce contingency tables are table and xtabs. We can use these two functions to get one, two or higher dimension tables that summarise the number of records that correspond to the combination of variables used to create the table.

The simplest case is where we are interested into a summary based on a single variable and the syntax is straightforward. The function table takes a single argument that corresponds to a vector of data. For example, if we are working with a data frame based on an unbalanced design and wanted to count the number of observations corresponding to each treatment we might run some code like:

table(temp.design3$Treatment)

which would produce a simple summary table:

A B C D 
7 7 3 7

If there was a second factor in the data set corresponding to different plots, labelled 1 to 4, then we could generate a two dimensional contingency table by adding a second argument to the function call like:

table(temp.design3$Treatment, temp.design3$Plot)

and the output would be of the form:

The function xtabs can be used to create the same contingency tables but the function works using a formula in a similar vein to the modelling functions. So to get the one dimensional table we would write code similar to this:

xtabs(~ Plot, data = temp.design3)

which would summarise the data by plot:

Plot
1 2 3 4 
7 7 7 3

Note that the output is slightly different to using the table function. The two dimensional table would be created like this:

xtabs(~ Treatment + Plot, data = temp.design3)

and the output would be:

         Plot
Treatment 1 2 3 4
        A 2 2 2 1
        B 2 2 2 1
        C 1 1 1 0
        D 2 2 2 1

These functions can be extended to higher dimensions and the output is based on 2×2 tables for each combination of the other variables.

Posted by Ralph at 8:50 pm Comments Off on Cross-tabulation of Data

Comments are closed.

There are the following book reviews on this website:

Book Review – Modern Applied Statistics with S by W. N. Venables and B. D. Ripley (Springer 2003)

Book Review – ggplot 2: Elegant Graphics for Data Analysis by Hadley Wickham (Springer 2009)

Book Review – Interactive and Dynamic Graphics for Data Analysis: With R and GGobi by Dianne Cook and Deborah F. Swayne (Springer 2007)

Book Review – Lattice: Multivariate Data Visualization with R by Deepayan Sarkar (Springer 2008)

Software for Exploratory Data Analysis and Statistical Modelling

Pages

Categories

Archives

Cross-tabulation of Data