When writing R code it is useful to be able to assess the amount of time that a particular function takes to run. We might be interested in measuring the increase in time required by our function as the size of the data increases.
To illustrate using the system.time function to calculate the time taken to run an expression consider a set of football results where we are using a logistic regression model to determine the factors that change the probability of a home win. If we fit a logistic regression model using the glm function to our data set with variables for the home and away team we can embed the function call inside the system.time function.
If the data is stored in the data frame called results.df the function call to fit the logistic regression model would be something of this form:
glm(HomeWin ~ Home + Away, data = results.df)
The function call would be:
> system.time(glm(HomeWin ~ Home + Away, data = results.df)) user system elapsed 1.62 0.08 1.72
The output is measured in seconds and is based on a set of data with 1,000 match results. We could extend the data set to 2,000 match results to see how the time to fit the model increases. If the new data set is stored in the data frame results.df2 then the function call would be:
> system.time(glm(HomeWin ~ Home + Away, data = results.df2)) user system elapsed 4.37 0.14 4.55
The time to run the function is increase by a factor of 2.7 (approx.) based on these two runs. This use of system.time provides some elementary information about the time taken for the expression to be evaluated.