Transformations to Create New Variables

May 18th, 2009

There are many situations where we might be interested in creating a new variable by transforming one of the variables already in the data frame. The R programming language can be used for either simple transformations or more complicated mathematical expressions where necessary.


Fast Tube by Casper

There are many situations where the logarithmic scale is used for data and if we have data on its original scale then we can use the log function in R to create a new variable. The default base for the log function is the natural logarithm. Referring back to the olive oil data set used in previous posts if we wanted to create a new variable that is the logarithm of one of the numeric variables then we could use code such as:

log(olive.df$palmitic)

which produces the following output:

  [1] 6.980076 6.992096 6.814543 6.873164 6.957497 6.814543 6.826545 7.003065 6.986566 6.944087
 [11] 6.957497 6.943122 6.979145 6.774224 6.858565 7.051856 6.849066 7.153052 6.867974 6.858565
 [21] 6.979145 6.902743 6.962243 6.970730 6.970730 7.181592 7.186144 7.214504 7.228388 7.166266
...

If we had wanted to to take the square root of this variable instead we would use the sqrt function:

sqrt(olive.df$palmitic)

and the output would be:

  [1] 32.78719 32.98485 30.18278 31.08054 32.41913 30.18278 30.36445 33.16625 32.89377 32.20248
 [11] 32.41913 32.18695 32.77194 29.58040 30.85450 33.98529 30.70831 35.74913 31.00000 30.85450
 [21] 32.77194 31.54362 32.49615 32.63434 32.63434 36.26293 36.34556 36.86462 37.12142 35.98611
...

It is possible to create more complicated expressions based on different transformations using the R programming language.

For example we might want to centre a vector of data by removing the mean value for the data and scale to unit variance. Based on the palmitic variable used in the previous examples we could use the scale function to perform this operation:

> scale(olive.df$palmitic)
              [,1]
  [1,] -0.92970611
  [2,] -0.85259700
  [3,] -1.90246724
...
[571,] -1.43388109
[572,] -1.61182519
attr(,"scaled:center")
[1] 1231.741
attr(,"scaled:scale")
[1] 168.5923

The mean and standard deviations used for the scaling are saved as attributes by the function.

Comments are closed.