Melt

April 5th, 2012

There are many situations where data is presented in a format that is not ready to dive straight to exploratory data analysis or to use a desired statistical method. The reshape2 package for R provides useful functionality to avoid having to hack data around in a spreadsheet prior to import into R.

The melt function takes data in wide format and stacks a set of columns into a single column of data. To make use of the function we need to specify a data frame, the id variables (which will be left at their settings) and the measured variables (columns of data) to be stacked. The default assumption on measured variables is that it is all columns that are not specified as id variables.

Consider the following set of data:

> dat
  FactorA FactorB     Group1     Group2     Group3      Group4
1     Low     Low -1.1616334 -0.5228371 -0.6587093  0.45064563
2  Medium     Low -0.5991478 -1.0461138 -0.1942979  2.47985577
3    High     Low  0.8420797 -1.5413266  0.6318852 -0.98948125
4     Low  Medium  1.6225569 -1.2706469 -0.8026467 -0.32332181
5  Medium  Medium -0.3450745 -1.3377985  1.4988363  0.36541918
6    High  Medium  1.6025044  0.7631882 -0.5375833  0.85028148
7     Low    High -1.2991011 -0.2223622 -0.6321478 -1.57284216
8  Medium    High -0.4906400 -1.1802192  0.1235253  0.09891793
9    High    High  0.3897769 -0.3832142  0.6671101  0.23407257

There four groups are to used as part of a statistical analysis so we want to stack them into a single column and create an factor variable to indicate which group the measurement corresponds to and the melt function does the trick:

> melt(dat)
Using FactorA, FactorB as id variables
   FactorA FactorB variable       value
1      Low     Low   Group1 -1.16163338
2   Medium     Low   Group1 -0.59914783
3     High     Low   Group1  0.84207974
4      Low  Medium   Group1  1.62255690
5   Medium  Medium   Group1 -0.34507455
6     High  Medium   Group1  1.60250438
 
...
36    High    High   Group4  0.23407257

Consider a second set of data where there are two groups but we only want to retain the FactorB variable in the molten data set:

   FactorA   FactorB   Group1   Group2
1      Low  Very Low 6.851828 3.061329
2   Medium  Very Low 7.352169 1.303077
3     High  Very Low 6.918091 2.477875
4      Low       Low 7.402351 2.450527
5   Medium       Low 6.928385 4.334323
6     High       Low 7.400626 3.074158
7      Low    Medium 8.312145 5.725185
8   Medium    Medium 8.251806 4.384492
9     High    Medium 8.339398 3.443789
10     Low      High 5.127386 2.868952
11  Medium      High 8.561181 3.616898
12    High      High 6.993838 3.450634
13     Low Very High 7.880877 2.950622
14  Medium Very High 9.439892 3.220295
15    High Very High 8.799447 3.106060

We now need to specify both the id.vars and measure.vars arguments in the melt function to get the desired output:

> melt(dat, id.vars = "FactorB", measure.vars = c("Group1", "Group2"))
     FactorB variable    value
1   Very Low   Group1 6.851828
2   Very Low   Group1 7.352169
3   Very Low   Group1 6.918091
4        Low   Group1 7.402351
5        Low   Group1 6.928385
6        Low   Group1 7.400626
...
30 Very High   Group2 3.106060

Other useful resources are provided on the Supplementary Material page.

Comments are closed.