# Seasonal Trend Decomposition in R

January 11th, 2013

The Seasonal Trend Decomposition using Loess (STL) is an algorithm that was developed to help to divide up a time series into three components namely: the trend, seasonality and remainder. The methodology was presented by Robert Cleveland, William Cleveland, Jean McRae and Irma Terpenning in the Journal of Official Statistics in 1990. The STL is available within R via the stl function. Read the rest of this entry »

# Graph Types: Pie Charts

October 13th, 2012

The pie chart is a frequently seen graph that uses area to compare percentages for a set of categories. Although this type of graph is based on comparing single metric for each category the display is two dimensional but sometimes even appears in three dimensions. Read the rest of this entry »

# Graph Design Principles

June 25th, 2012

There are a set of basic principles that hold true for the design of many graphs and various authors have their own preferences. One author who is prominent due to his good work in the area of data visualisation and presentation of evidence to support decision making is Edward Tufte. Read the rest of this entry »

# Plotting Time Series data using ggplot2

September 30th, 2010

There are various ways to plot data that is represented by a time series in R. The ggplot2 package has scales that can handle dates reasonably easily. Read the rest of this entry »

# Charting the performance of cricket all-rounders – IT Botham

August 16th, 2010

Cricket is a sport that generates a large volume of performance data and corresponding debate about the relative qualities of various players over their careers and in relation to their contemporaries. The cricinfo website has an extensive database of statistics for professional cricketers that can be searched to access the information in various formats. Read the rest of this entry »

# Displaying data using level plots

May 3rd, 2010

A level plot is a type of graph that is used to display a surface in two rather than three dimensions – the surface is viewed from above as if we were looking straight down and is an alternative to a contour plot – geographic data is an example of where this type of graph would be used. A contour plot uses lines to identify regions of different heights and the level plot uses coloured regions to produce a similar effect. Read the rest of this entry »

# Summarising data using box and whisker plots

April 25th, 2010

A box and whisker plot is a type of graphical display that can be used to summarise a set of data based on the five number summary of this data. The summary statistics used to create a box and whisker plot are the median of the data, the lower and upper quartiles (25% and 75%) and the minimum and maximum values. Read the rest of this entry »

# R and Tolerance Intervals

April 19th, 2010

Confidence intervals and prediction intervals are used by statisticians on a regular basis. Another useful interval is the tolerance interval that describes the range of values for a distribution with confidence limits calculated to a particular percentile of the distribution. The R package tolerance can be used to create a variety of tolerance intervals of interest. Read the rest of this entry »

# Summarising data using scatter plots

April 18th, 2010

A scatter plot is a graph used to investigate the relationship between two variables in a data set. The x and y axes are used for the values of the two variables and a symbol on the graph represents the combination for each pair of values in the data set. This type of graph is used in many common situations and can convey a lot of useful information. Read the rest of this entry »

# Summarising data using histograms

April 11th, 2010

The histogram is a standard type of graphic used to summarise univariate data where the range of values in the data set is divided into regions and a bar (usually vertical) is plotted in each of these regions with height proportional to the frequency of observations in that region. In some cases the proportion of data points in each region is shown instead of counts. Read the rest of this entry »