Sequences and Other Regular Arrangements of Data

May 26th, 2009

In Statistical analysis there are frequently situations where regular structures occur, such as in designed experiments, and R has facilities for generating data frames in a simple way.

The function expand.grid can be used to create a design by specifying a series of factors and the levels for these factors. A data frame with all the combinations of the factors levels will be created. For example, if we had a two factor experiment where the first factor had four levels labelled A, B, C and D and the second factor had three levels labelled I, II, and III then we could create the data frame for the design using this code:

expand.grid(Factor1 = c("A", "B", "C", "D"), Factor2 = c("I", "II", "III"))

which would produce the following output:

   Factor1 Factor2
1        A       I
2        B       I
3        C       I
4        D       I
5        A      II
6        B      II
7        C      II
8        D      II
9        A     III
10       B     III
11       C     III
12       D     III

It is also possible to create various other sequences using the seq and rep commands. To create the numbers from 1 to 10 we could run this code:

> 1:10
 [1]  1  2  3  4  5  6  7  8  9 10

Alternatively the seq function provides greater control over start and end values and the step between each variable. A couple of examples are shown below:

> seq(1, 5)
[1] 1 2 3 4 5
> seq(10, 1, -2)
[1] 10  8  6  4  2

The negative step indicates that the sequence is decreasing.

Another common pattern is where we might want to repeat a number a given number of times. To get ten replicates of the number one we use the rep function:

> rep(1, 10)
 [1] 1 1 1 1 1 1 1 1 1 1

If we want to repeat a sequence multiple times then we provide the sequence as the first argument to the rep function. So to repeat the numbers one to ten twice we would write:

> rep(1:10, 2)
 [1]  1  2  3  4  5  6  7  8  9 10  1  2  3  4  5  6  7  8  9 10

So the 1:10 evaluates first to the numbers one to ten and the whole thing is repeated twice. A further arrangement where we might want to repeat each element of the sequence a given number of times is accessed by nesting a rep call inside a rep function. The second argument becomes a vector of the same length as the first argument. As an example:

> rep(1:5, rep(3, 5))
 [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5

Comments are closed.