The Seasonal Trend Decomposition using Loess (STL) is an algorithm that was developed to help to divide up a time series into three components namely: the trend, seasonality and remainder. The methodology was presented by Robert Cleveland, William Cleveland, Jean McRae and Irma Terpenning in the Journal of Official Statistics in 1990. The STL is available within R via the stl function. Read the rest of this entry »
Seasonal Trend Decomposition in R
January 11th, 2013Graph Types: Pie Charts
October 13th, 2012The pie chart is a frequently seen graph that uses area to compare percentages for a set of categories. Although this type of graph is based on comparing single metric for each category the display is two dimensional but sometimes even appears in three dimensions. Read the rest of this entry »
Graph Design Principles
June 25th, 2012There are a set of basic principles that hold true for the design of many graphs and various authors have their own preferences. One author who is prominent due to his good work in the area of data visualisation and presentation of evidence to support decision making is Edward Tufte. Read the rest of this entry »
Logistic Regression and Bias Reduction
May 22nd, 2012David Firth published a paper in 1993 on maximum likelihood estimation and the reduction of bias when using this approach. The research in this area appears to provide benefit for logistic regression in small data sets where there is complete of quasi separation. This approach has been implemented for Generalized Linear Models in the brglm package. Read the rest of this entry »
Generalized Linear Models – Poisson Regression
June 26th, 2011The Generalized Linear Model (GLM) allows us to model responses with distributions other than the Normal distribution, which is one of the assumptions underlying linear regression as used in many cases. When data is counts of events (or items) then a discrete distribution is more appropriate is usually more appropriate than approximating with a continuous distribution, especially as our counts should be bounded below at zero. Negative counts do not make sense. Read the rest of this entry »
Data Mining with WEKA
January 30th, 2011There are a number of good open source projects for statistics and data mining, for example the software WEKA developed at the University of Waikato. Read the rest of this entry »
Gapminder
January 6th, 2011As many people are aware Hans Rosling is an enthusiastic swedish academic with a passion for statistics who recently presented the program The Joy of Stats. One of the great things about Hans Rosling is his presentations and the interactive graphics that he uses to make his points. Read the rest of this entry »
Plotting Time Series data using ggplot2
September 30th, 2010There are various ways to plot data that is represented by a time series in R. The ggplot2 package has scales that can handle dates reasonably easily. Read the rest of this entry »
Classification Trees using the rpart function
September 21st, 2010In a previous post on classification trees we considered using the tree package to fit a classification tree to data divided into known classes. In this post we will look at the alternative function rpart that is available within the base R distribution. Read the rest of this entry »
Classification Trees
September 18th, 2010Decision trees are applied to situation where data is divided into groups rather than investigating a numerical response and its relationship to a set of descriptor variables. There are various implementations of classification trees in R and the some commonly used functions are rpart and tree. Read the rest of this entry »



