The Generalized Linear Model (GLM) allows us to model responses with distributions other than the Normal distribution, which is one of the assumptions underlying linear regression as used in many cases. When data is counts of events (or items) then a discrete distribution is more appropriate is usually more appropriate than approximating with a continuous distribution, especially as our counts should be bounded below at zero. Negative counts do not make sense. Read the rest of this entry »
Generalized Linear Models – Poisson Regression
June 26th, 2011Fractional Factorial Designs using FrF2
May 18th, 2011The FrF2 package for R can be used to create regular and non-regular Fractional Factorial 2-level designs. It is reasonably straightforward to use. Read the rest of this entry »
Data Mining with WEKA
January 30th, 2011There are a number of good open source projects for statistics and data mining, for example the software WEKA developed at the University of Waikato. Read the rest of this entry »
Gapminder
January 6th, 2011As many people are aware Hans Rosling is an enthusiastic swedish academic with a passion for statistics who recently presented the program The Joy of Stats. One of the great things about Hans Rosling is his presentations and the interactive graphics that he uses to make his points. Read the rest of this entry »
Plotting Time Series data using ggplot2
September 30th, 2010There are various ways to plot data that is represented by a time series in R. The ggplot2 package has scales that can handle dates reasonably easily. Read the rest of this entry »
Classification Trees using the rpart function
September 21st, 2010In a previous post on classification trees we considered using the tree package to fit a classification tree to data divided into known classes. In this post we will look at the alternative function rpart that is available within the base R distribution. Read the rest of this entry »
Classification Trees
September 18th, 2010Decision trees are applied to situation where data is divided into groups rather than investigating a numerical response and its relationship to a set of descriptor variables. There are various implementations of classification trees in R and the some commonly used functions are rpart and tree. Read the rest of this entry »
Charting the performance of cricket all-rounders – IT Botham
August 16th, 2010Cricket is a sport that generates a large volume of performance data and corresponding debate about the relative qualities of various players over their careers and in relation to their contemporaries. The cricinfo website has an extensive database of statistics for professional cricketers that can be searched to access the information in various formats. Read the rest of this entry »
Generating Balanced Incomplete Block Designs (BIBD)
July 16th, 2010The Balanced Incomplete Block Design (BIBD) is a well studied experimental design that has various desirable features from a statistical perspective. The crossdes package in R provides a way to generate a block design for some given parameters and test wheter this design satisfies the BIBD conditions. Read the rest of this entry »
R Commander – two-way analysis of variance
June 25th, 2010Two way analysis of variance models can be fitted to data using the R Commander GUI. The general approach is similar to fitting the other types of model in R Commander described in previous posts. Read the rest of this entry »



