Getting started with GAMLSS

January 19th, 2014

The Generalized Additive Models for Location, Scale and Shape (GAMLSS) is a recent development which provides a framework with access to a large set of distributions and the ability to model all of the parameters of these distributions as functions of the explanatory variables within a data set. Read the rest of this entry »

Book on Time Series Forecasting

May 6th, 2013

The online book on time series forecasting methods by Rob Hyndman and George Athana­sopou­los has been completed and was announced on the Hyndsight blog. It is a very accessible book and worth reading to understand time series methodology and useful strategies for making predictions using these models.

Seasonal Trend Decomposition in R

January 11th, 2013

The Seasonal Trend Decomposition using Loess (STL) is an algorithm that was developed to help to divide up a time series into three components namely: the trend, seasonality and remainder. The methodology was presented by Robert Cleveland, William Cleveland, Jean McRae and Irma Terpenning in the Journal of Official Statistics in 1990. The STL is available within R via the stl function. Read the rest of this entry »

Graph Types: Pie Charts

October 13th, 2012

The pie chart is a frequently seen graph that uses area to compare percentages for a set of categories. Although this type of graph is based on comparing single metric for each category the display is two dimensional but sometimes even appears in three dimensions. Read the rest of this entry »

Graph Design Principles

June 25th, 2012

There are a set of basic principles that hold true for the design of many graphs and various authors have their own preferences. One author who is prominent due to his good work in the area of data visualisation and presentation of evidence to support decision making is Edward Tufte. Read the rest of this entry »

Logistic Regression and Bias Reduction

May 22nd, 2012

David Firth published a paper in 1993 on maximum likelihood estimation and the reduction of bias when using this approach. The research in this area appears to provide benefit for logistic regression in small data sets where there is complete of quasi separation. This approach has been implemented for Generalized Linear Models in the brglm package. Read the rest of this entry »

Generalized Linear Models – Poisson Regression

June 26th, 2011

The Generalized Linear Model (GLM) allows us to model responses with distributions other than the Normal distribution, which is one of the assumptions underlying linear regression as used in many cases. When data is counts of events (or items) then a discrete distribution is more appropriate is usually more appropriate than approximating with a continuous distribution, especially as our counts should be bounded below at zero. Negative counts do not make sense. Read the rest of this entry »

Data Mining with WEKA

January 30th, 2011

There are a number of good open source projects for statistics and data mining, for example the software WEKA developed at the University of Waikato. Read the rest of this entry »

Gapminder

January 6th, 2011

As many people are aware Hans Rosling is an enthusiastic swedish academic with a passion for statistics who recently presented the program The Joy of Stats. One of the great things about Hans Rosling is his presentations and the interactive graphics that he uses to make his points. Read the rest of this entry »

Plotting Time Series data using ggplot2

September 30th, 2010

There are various ways to plot data that is represented by a time series in R. The ggplot2 package has scales that can handle dates reasonably easily. Read the rest of this entry »