Vector Calculations to avoid Explicit Loops

May 23rd, 2009

The S programming language has facilities for applying a function to all the individual elements of a vector, matrix or data frame which avoid the need to make explicit use of loops. In fact using loops in R is not recommended as this will slow down the calculations, but there will of course be some situations where it is unavoidable.

There is a function called apply that can be used to run a specific function on each of the rows or columns individually. For example we could calculate row or column means or variances using the apply or we could define a more complicated function that is more appropriate for the statistics that we want to calculate. If we take a look at the Olive oil data used in some of the other posts we might be interested in calculating variable (columns in this case) means and we would use this code:

apply(olive.df[,c("palmitic", "palmitoleic", "stearic", "oleic", "linoleic",
  "linolenic", "arachidic", "eicosenoic")], 2, mean)

The first thing we do is indicate which columns that we are interested in as the Region and Area are not important for these mean calculations – the square brackets are used to specify a subset of our data frame and we provide a vector of column names after the comma. The output from this function call is:

   palmitic palmitoleic     stearic       oleic    linoleic   linolenic   arachidic  eicosenoic 
 1231.74126   126.09441   228.86538  7311.74825   980.52797    31.88811    58.09790    16.28147

We could quite easily adjust this function call to use a different function on the data. Let’s say that we are interested in the maximum values for each variable then we would replace mean with max:

apply(olive.df[,c("palmitic", "palmitoleic", "stearic", "oleic", "linoleic",
  "linolenic", "arachidic", "eicosenoic")], 2, max)

which returns:

   palmitic palmitoleic     stearic       oleic    linoleic   linolenic   arachidic  eicosenoic 
       1753         280         375        8410        1470          74         105          58

There are other associated functions – tapply, lapply and sapply – that perform on a similar routine on different types and format of data which will be discussed in subsequent posts.

Comments are closed.