Melt

April 5th, 2012

There are many situations where data is presented in a format that is not ready to dive straight to exploratory data analysis or to use a desired statistical method. The reshape2 package for R provides useful functionality to avoid having to hack data around in a spreadsheet prior to import into R. Read the rest of this entry »

Programming with R – Processing Football League Data Part II

December 3rd, 2010

Following on from the previous post about creating a football result processing function for data from the football-data.co.uk website we will add code to the function to generate a league table based on the results to date. Read the rest of this entry »

Programming with R – Processing Football League Data Part I

November 23rd, 2010

In this post we will make use of football results data from the football-data.co.uk website to demonstrate creating functions in R to automate a series of standard operations that would be required for results data from various leagues and divisions. Read the rest of this entry »

Useful functions for data frames

August 9th, 2010

The R software system is primarily command line based so when there are large sets of data it is not easy to browse the data frames. There are various useful functions for working with data frames. Read the rest of this entry »

Creating Date Objects using Character Strings

September 10th, 2009

The use of dates can frequently be problematic because there is such a wide range of format used to store data information. The R system has various facilities for defining and working with dates and can handle a wide range of formats that might be encountered in a set of data. Read the rest of this entry »

Using R to Access Data in a MySQL database

July 24th, 2009

The R import/export manual discusses various approaches to handling data and mentions that R is not suitable for working with large data sets because data objects are stored in memory during a session. There are situations where using a database to hold the data and making use of one of the R libraries for database connectivity to access the data or to save the data. Read the rest of this entry »

Sequences and Other Regular Arrangements of Data

May 26th, 2009

In Statistical analysis there are frequently situations where regular structures occur, such as in designed experiments, and R has facilities for generating data frames in a simple way. Read the rest of this entry »

Transformations to Create New Variables

May 18th, 2009

There are many situations where we might be interested in creating a new variable by transforming one of the variables already in the data frame. The R programming language can be used for either simple transformations or more complicated mathematical expressions where necessary. Read the rest of this entry »

Cross-tabulation of Data

May 15th, 2009

The contingency table is used to summarise data when there are factors in the data set and we are interested in counting the number of occurrences of each combination of factor variables. In R there are different ways that these types of table can be produced and manipulated as required. Read the rest of this entry »

Working with Subsets of Data

May 8th, 2009

There are often situations where we might be interested in a subset of our complete data and there are simple mechanisms for viewing and editing particular subsets of a data frame or other objects in R. Read the rest of this entry »