<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Software for Exploratory Data Analysis and Statistical Modelling &#187; File Import/Export</title>
	<atom:link href="http://www.wekaleamstudios.co.uk/topics/r-environment/file-import-export/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.wekaleamstudios.co.uk</link>
	<description>Statistical Modelling with R</description>
	<lastBuildDate>Wed, 01 Feb 2012 19:44:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Programming with R &#8211; Processing Football League Data Part II</title>
		<link>http://www.wekaleamstudios.co.uk/posts/programming-with-r-processing-football-league-data-part-ii/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/programming-with-r-processing-football-league-data-part-ii/#comments</comments>
		<pubDate>Fri, 03 Dec 2010 10:26:39 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Data Manipulation]]></category>
		<category><![CDATA[Data Summary]]></category>
		<category><![CDATA[File Import/Export]]></category>
		<category><![CDATA[S Programming]]></category>
		<category><![CDATA[as.numeric]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data frame]]></category>
		<category><![CDATA[England]]></category>
		<category><![CDATA[football]]></category>
		<category><![CDATA[ifelse]]></category>
		<category><![CDATA[Premiership]]></category>
		<category><![CDATA[results]]></category>
		<category><![CDATA[table]]></category>
		<category><![CDATA[tapply]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1459</guid>
		<description><![CDATA[Following on from the previous post about creating a football result processing function for data from the football-data.co.uk website we will add code to the function to generate a league table based on the results to date. To create the league table we need to count various things such as the number of games played, [...]]]></description>
			<content:encoded><![CDATA[<p>Following on from the previous <a href="http://www.wekaleamstudios.co.uk/posts/programming-with-r-processing-football-league-data-part-i/">post</a> about creating a football result processing function for data from the <a href="http://www.football-data.co.uk">football-data.co.uk</a> website we will add code to the function to generate a league table based on the results to date.<span id="more-1459"></span></p>
<p>To create the league table we need to count various things such as the number of games played, number of wins/draws/losses, goals scored etc. This information is available in the results object that is loaded from a <strong>csv</strong> file in the function as it stands.</p>
<p>To facilitate these calculations we create a data frame with a row for each team in the division and then calculate the statistics required &#8211; this was a reason for ordering the factors in the <strong>HomeTeam</strong> and <strong>AwayTeam</strong> columns of the results table. The data frame is created with the code below:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpTable = data.frame(Team = teams,
    Games = 0, Win = 0, Draw = 0, Loss = 0,
    HomeGames = 0, HomeWin = 0, HomeDraw = 0, HomeLoss = 0,
    AwayGames = 0, AwayWin = 0, AwayDraw = 0, AwayLoss = 0,
    Points = 0,
    HomeFor = 0, HomeAgainst = 0,
    AwayFor = 0, AwayAgainst = 0,
    For = 0, Against = 0, GoalDifference = 0)</pre></div></div>

<p>There are a number of slots that are may be redundant in a league table but are used for intermediate calculations, such as <strong>HomeWin</strong> and <strong>AwayWin</strong> that are combined to find the total number of victories for a team.</p>
<p>The number of games played by each team home and away are counted using the table command for the two columns respectively.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpTable$HomeGames = as.numeric(table(tmpResults$HomeTeam))
tmpTable$AwayGames = as.numeric(table(tmpResults$AwayTeam))</pre></div></div>

<p>The labels created by the table command are discarded using the as.numeric function to retain only the number of games. The table command is also used to count the number of wins, draws and losses at home and away for each team. The commands are shown here:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpTable$HomeWin =
    as.numeric(table(tmpResults$HomeTeam[tmpResults$FTR == &quot;H&quot;]))
tmpTable$HomeDraw =
    as.numeric(table(tmpResults$HomeTeam[tmpResults$FTR == &quot;D&quot;]))
tmpTable$HomeLoss =
    as.numeric(table(tmpResults$HomeTeam[tmpResults$FTR == &quot;A&quot;]))
&nbsp;
tmpTable$AwayWin =
    as.numeric(table(tmpResults$AwayTeam[tmpResults$FTR == &quot;A&quot;]))
tmpTable$AwayDraw =
    as.numeric(table(tmpResults$AwayTeam[tmpResults$FTR == &quot;D&quot;]))
tmpTable$AwayLoss =
    as.numeric(table(tmpResults$AwayTeam[tmpResults$FTR == &quot;H&quot;]))</pre></div></div>

<p>Note that we subset on the values in the <strong>FTR</strong> column, which is full-time result, and then count. The subsetting is reversed when looking at the away fixtures because a victory for the team is now an away win rather than a home win.</p>
<p>This information is then combined to get total games played, won etc.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpTable$Games = tmpTable$HomeGames + tmpTable$AwayGames
tmpTable$Win = tmpTable$HomeWin + tmpTable$AwayWin
tmpTable$Draw = tmpTable$HomeDraw + tmpTable$AwayDraw
tmpTable$Loss = tmpTable$HomeLoss + tmpTable$AwayLoss</pre></div></div>

<p>The total points is calclated by multiplying the number of wins, draws and losses by the number of points awarded for each match outcome.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpTable$Points = winPoints * tmpTable$Win +
    drawPoints * tmpTable$Draw + lossPoints * tmpTable$Loss</pre></div></div>

<p>The next set of calculations are to count the number of goals scored, goals conceeded and goal difference. The <strong>tapply</strong> function is used for these calculations.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpTable$HomeFor =
    as.numeric(tapply(tmpResults$FTHG, tmpResults$HomeTeam, sum, na.rm = TRUE))
tmpTable$HomeAgainst =
    as.numeric(tapply(tmpResults$FTAG, tmpResults$HomeTeam, sum, na.rm = TRUE))
&nbsp;
tmpTable$AwayFor =
    as.numeric(tapply(tmpResults$FTAG, tmpResults$AwayTeam, sum, na.rm = TRUE))
tmpTable$AwayAgainst =
    as.numeric(tapply(tmpResults$FTHG, tmpResults$AwayTeam, sum, na.rm = TRUE))</pre></div></div>

<p>The <strong>tapply</strong> function applies the <strong>sum</strong> to the number of goals scored at home or away, and the number of goals conceeded by each team in the division. These are then combined to create totals home and away:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpTable$For =
    ifelse(is.na(tmpTable$HomeFor), 0, tmpTable$HomeFor) +
    ifelse(is.na(tmpTable$AwayFor), 0, tmpTable$AwayFor)
tmpTable$Against =
    ifelse(is.na(tmpTable$HomeAgainst), 0, tmpTable$HomeAgainst) +
    ifelse(is.na(tmpTable$AwayAgainst), 0, tmpTable$AwayAgainst)</pre></div></div>

<p>The <strong>ifelse</strong> statement is used to handle situations where a team hasn&#8217;t played a home and/or away fixture yet. The goal difference is easy to calculate:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpTable$GoalDifference = tmpTable$For - tmpTable$Against</pre></div></div>

<p>Now that all of the statistics have been calculated we sort the table based on the number of points, goal difference and finally alphabetically. There might be different ways that we can order the teams but this is what we will use for the time being:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpTable =
  tmpTable[order(- tmpTable$Points, - tmpTable$GoalDifference, tmpTable$Team),]</pre></div></div>

<p>The ordering might look odd but we want to ranking from highest to lowest points and goal difference but then in ascending alphabetical order for the teams.</p>
<p>The whole function is now:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">football.process.v2 = function(datafile, country, divname, season, teams, winPoints = 3, drawPoints = 1, lossPoints = 0)
{
## Validation Function Arguments
&nbsp;
if (missing(datafile))
{
stop(&quot;Results csv file not specified.&quot;)
}
&nbsp;
if (missing(country))
{
warning(&quot;Country of league not specified.&quot;)
country = &quot;&quot;
}
&nbsp;
if (missing(divname))
{
warning(&quot;Name of league division not specified.&quot;)
divname = &quot;&quot;
}
&nbsp;
## Import Results
&nbsp;
tmpResults = read.csv(datafile)[,c(&quot;Date&quot;,&quot;HomeTeam&quot;,&quot;AwayTeam&quot;,&quot;FTR&quot;,&quot;FTHG&quot;,&quot;FTAG&quot;)]
&nbsp;
if (missing(teams))
{
warning(&quot;Team names not specified - extracted from results data.&quot;)
teams = sort(unique(c(as.character(tmpResults$HomeTeam), as.character(tmpResults$AwayTeam))))
}
&nbsp;
tmpResults$HomeTeam = factor(tmpResults$HomeTeam, levels = teams)
tmpResults$AwayTeam = factor(tmpResults$AwayTeam, levels = teams)
&nbsp;
## Create Empty League Table
&nbsp;
tmpTable = data.frame(Team = teams,
Games = 0, Win = 0, Draw = 0, Loss = 0,
HomeGames = 0, HomeWin = 0, HomeDraw = 0, HomeLoss = 0,
AwayGames = 0, AwayWin = 0, AwayDraw = 0, AwayLoss = 0,
Points = 0,
HomeFor = 0, HomeAgainst = 0,
AwayFor = 0, AwayAgainst = 0,
For = 0, Against = 0, GoalDifference = 0)
&nbsp;
## Count Number of Games Played
&nbsp;
tmpTable$HomeGames = as.numeric(table(tmpResults$HomeTeam))
tmpTable$AwayGames = as.numeric(table(tmpResults$AwayTeam))
&nbsp;
## Count Number of Wins/Draws/Losses
&nbsp;
tmpTable$HomeWin = as.numeric(table(tmpResults$HomeTeam[tmpResults$FTR == &quot;H&quot;]))
tmpTable$HomeDraw = as.numeric(table(tmpResults$HomeTeam[tmpResults$FTR == &quot;D&quot;]))
tmpTable$HomeLoss = as.numeric(table(tmpResults$HomeTeam[tmpResults$FTR == &quot;A&quot;]))
&nbsp;
tmpTable$AwayWin = as.numeric(table(tmpResults$AwayTeam[tmpResults$FTR == &quot;A&quot;]))
tmpTable$AwayDraw = as.numeric(table(tmpResults$AwayTeam[tmpResults$FTR == &quot;D&quot;]))
tmpTable$AwayLoss = as.numeric(table(tmpResults$AwayTeam[tmpResults$FTR == &quot;H&quot;]))
&nbsp;
tmpTable$Games = tmpTable$HomeGames + tmpTable$AwayGames
tmpTable$Win = tmpTable$HomeWin + tmpTable$AwayWin
tmpTable$Draw = tmpTable$HomeDraw + tmpTable$AwayDraw
tmpTable$Loss = tmpTable$HomeLoss + tmpTable$AwayLoss
tmpTable$Points = winPoints * tmpTable$Win + drawPoints * tmpTable$Draw + lossPoints * tmpTable$Loss
&nbsp;
## Count Goals Scored and Conceeded
&nbsp;
tmpTable$HomeFor = as.numeric(tapply(tmpResults$FTHG, tmpResults$HomeTeam, sum, na.rm = TRUE))
tmpTable$HomeAgainst = as.numeric(tapply(tmpResults$FTAG, tmpResults$HomeTeam, sum, na.rm = TRUE))
&nbsp;
tmpTable$AwayFor = as.numeric(tapply(tmpResults$FTAG, tmpResults$AwayTeam, sum, na.rm = TRUE))
tmpTable$AwayAgainst = as.numeric(tapply(tmpResults$FTHG, tmpResults$AwayTeam, sum, na.rm = TRUE))
&nbsp;
tmpTable$For = ifelse(is.na(tmpTable$HomeFor), 0, tmpTable$HomeFor) +
ifelse(is.na(tmpTable$AwayFor), 0, tmpTable$AwayFor)
tmpTable$Against = ifelse(is.na(tmpTable$HomeAgainst), 0, tmpTable$HomeAgainst) +
ifelse(is.na(tmpTable$AwayAgainst), 0, tmpTable$AwayAgainst)
&nbsp;
tmpTable$GoalDifference = tmpTable$For - tmpTable$Against
&nbsp;
## Sort Table
## By Points
## By Goal Difference
## By Team Name (Alphabetical)
&nbsp;
tmpTable = tmpTable[order(- tmpTable$Points, - tmpTable$GoalDifference, tmpTable$Team),]
&nbsp;
tmpTable = tmpTable[,c(&quot;Team&quot;, &quot;Games&quot;, &quot;Win&quot;, &quot;Draw&quot;, &quot;Loss&quot;, &quot;Points&quot;, &quot;For&quot;, &quot;Against&quot;, &quot;GoalDifference&quot;)]
&nbsp;
## Return Division Information
&nbsp;
tmpSummary = list(Country = country, Division = divname, Season = season, Teams = teams,
Results = tmpResults, Table = tmpTable)
&nbsp;
invisible(tmpSummary)
}</pre></div></div>

<p>There are other functionality that we might want to add to the function.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/programming-with-r-processing-football-league-data-part-ii/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Programming with R &#8211; Processing Football League Data Part I</title>
		<link>http://www.wekaleamstudios.co.uk/posts/programming-with-r-processing-football-league-data-part-i/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/programming-with-r-processing-football-league-data-part-i/#comments</comments>
		<pubDate>Tue, 23 Nov 2010 14:14:45 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Data Manipulation]]></category>
		<category><![CDATA[Data Summary]]></category>
		<category><![CDATA[File Import/Export]]></category>
		<category><![CDATA[S Programming]]></category>
		<category><![CDATA[csv]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[England]]></category>
		<category><![CDATA[football]]></category>
		<category><![CDATA[list]]></category>
		<category><![CDATA[Premiership]]></category>
		<category><![CDATA[print]]></category>
		<category><![CDATA[read.csv]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1447</guid>
		<description><![CDATA[In this post we will make use of football results data from the football-data.co.uk website to demonstrate creating functions in R to automate a series of standard operations that would be required for results data from various leagues and divisions. The first step is to consider what control options should be available as part of [...]]]></description>
			<content:encoded><![CDATA[<p>In this post we will make use of football results data from the <a href="http://www.football-data.co.uk">football-data.co.uk</a> website to demonstrate creating functions in <strong>R</strong> to automate a series of standard operations that would be required for results data from various leagues and divisions.<span id="more-1447"></span></p>
<p>The first step is to consider what control options should be available as part of the function and here is a list of some arguments that will be used for this implementation of a football result data processing function:</p>
<ul>
<li>The name of a <strong>csv</strong> data file from the <a href="http://www.football-data.co.uk">football-data.co.uk</a> website.</li>
<li>A text string to specify the country and division for the data.</li>
<li>A text string specifying the season.</li>
<li>A list of teams in the division (optional), which could be used to test for data entry errors in the data file.</li>
<li>The number of points for a win, draw or loss. This might seem a strange option initially but different leagues might award different points for the three outcomes.</li>
</ul>
<p>Some of this information might appear optional but is included so that we can write a custom <strong>print</strong> function at a later date to display a meaningful summary of the object (list) that will be created by the function.</p>
<p>The first part of our function is concerned with checking the various values provided to the function arguments. Our skeleton function is as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">football.process.v1 = function(datafile, country, divname, season,
  teams, winPoints = 3, drawPoints = 1, lossPoints = 0)
{
&nbsp;
}</pre></div></div>

<p>Here we have specified default options for three of the arguments with the most likely number of points for each match outcome, i.e. 3 points for a win and 1 point for a draw.</p>
<p>To illustrate the working of the result processing function we will use a small exert from the start of the 2010/2011 English Premiership season which is shown below:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">Div,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,Referee
E0,14/8/2010,Aston Villa,West Ham,3,0,H,2,0,H,M Dean
E0,14/8/2010,Blackburn,Everton,1,0,H,1,0,H,P Dowd
E0,14/8/2010,Bolton,Fulham,0,0,D,0,0,D,S Attwell
E0,14/8/2010,Chelsea,West Brom,6,0,H,2,0,H,M Clattenburg
E0,14/8/2010,Sunderland,Birmingham,2,2,D,1,0,H,A Taylor
E0,14/8/2010,Tottenham,Man City,0,0,D,0,0,D,A Marriner
E0,14/8/2010,Wigan,Blackpool,0,4,A,0,3,A,M Halsey
E0,14/8/2010,Wolves,Stoke,2,1,H,2,0,H,L Probert
E0,15/8/2010,Liverpool,Arsenal,1,1,D,0,0,D,M Atkinson
E0,16/8/2010,Man United,Newcastle,3,0,H,2,0,H,C Foy
E0,21/8/2010,Arsenal,Blackpool,6,0,H,3,0,H,M Jones
E0,21/8/2010,Birmingham,Blackburn,2,1,H,0,0,D,M Oliver
E0,21/8/2010,Everton,Wolves,1,1,D,1,0,H,L Mason
E0,21/8/2010,Stoke,Tottenham,1,2,A,1,2,A,C Foy
E0,21/8/2010,West Brom,Sunderland,1,0,H,0,0,D,K Friend
E0,21/8/2010,West Ham,Bolton,1,3,A,0,0,D,A Marriner
E0,21/8/2010,Wigan,Chelsea,0,6,A,0,1,A,M Dean
E0,22/8/2010,Fulham,Man United,2,2,D,0,1,A,P Walton
E0,22/8/2010,Newcastle,Aston Villa,6,0,H,3,0,H,M Atkinson
E0,23/8/2010,Man City,Liverpool,3,0,H,1,0,H,P Dowd
E0,28/8/2010,Blackburn,Arsenal,1,2,A,1,1,D,C Foy
E0,28/8/2010,Blackpool,Fulham,2,2,D,0,1,A,M Oliver
E0,28/8/2010,Chelsea,Stoke,2,0,H,1,0,H,M Atkinson
E0,28/8/2010,Man United,West Ham,3,0,H,1,0,H,M Clattenburg
E0,28/8/2010,Tottenham,Wigan,0,1,A,0,0,D,P Dowd
E0,28/8/2010,Wolves,Newcastle,1,1,D,1,0,H,S Attwell
E0,29/8/2010,Aston Villa,Everton,1,0,H,1,0,H,M Jones
E0,29/8/2010,Bolton,Birmingham,2,2,D,0,1,A,K Friend
E0,29/8/2010,Liverpool,West Brom,1,0,H,0,0,D,L Probert
E0,29/8/2010,Sunderland,Man City,1,0,H,0,0,D,M Dean</pre></div></div>

<p>This is stored in a file <strong>E0test.csv</strong> so that we can use the <strong>read.csv</strong> function to import the results data and then process it.</p>
<p>The first series of commands that we add to the function are for checking various function arguments specified by the user to ensure that they are sensible. First up we check whether a results data file has been specified as we cannot do any processing without any results. The simple test is for whether a file name has been specified:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">if (missing(datafile))
{
    stop(&quot;Results csv file not specified.&quot;)
}</pre></div></div>

<p>It might be sensible to check whether the object <strong>datafile</strong> is actually a character string specifying a file, but this hasn&#8217;t been done for now. We then check whether the country name and division have been specified and set them to blank strings if they haven&#8217;t been set by the user.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">if (missing(country))
{
    warning(&quot;Country of league not specified.&quot;)
    country = &quot;&quot;
}
&nbsp;
if (missing(divname))
{
    warning(&quot;Name of league division not specified.&quot;)
    divname = &quot;&quot;
}</pre></div></div>

<p>Next up we import the data file and only save the columns of interest (at this point of the development of the function at least. There are many more columns of information that we need in the raw data from the website,</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpResults =
    read.csv(datafile)[,c(&quot;Date&quot;,&quot;HomeTeam&quot;,&quot;AwayTeam&quot;,&quot;FTR&quot;,&quot;FTHG&quot;,&quot;FTAG&quot;)]</pre></div></div>

<p>The square brackets are used to subset on a part set of columns and only save these. Then we check whether the team names have been specified by the user and if not extract them from the data provided:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">if (missing(teams))
{
    warning(&quot;Team names not specified - extracted from results data.&quot;)
    teams = sort(unique(c(as.character(tmpResults$HomeTeam),
        as.character(tmpResults$AwayTeam))))
}</pre></div></div>

<p>The sort function is used to order the team names alphabetically which is the order often used in league tables, especially when no games have been played. We then convert the columns <strong>HomeTeam</strong> and <strong>AwayTeam</strong> into factors, which allows teams that haven&#8217;t played a fixture yet to be included in the table.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpResults$HomeTeam = factor(tmpResults$HomeTeam, levels = teams)
tmpResults$AwayTeam = factor(tmpResults$AwayTeam, levels = teams)</pre></div></div>

<p>To round off the first part of creating the result processing function we create a list object to return at the end of the function.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpSummary = list(Country = country, Division = divname,
    Season = season, Teams = teams, Results = tmpResults)</pre></div></div>

<p>The function so far:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">football.process.v1 = function(datafile, country, divname, season, teams, winPoints = 3, drawPoints = 1, lossPoints = 0)
{
## Validation Function Arguments
&nbsp;
if (missing(datafile))
{
stop(&quot;Results csv file not specified.&quot;)
}
&nbsp;
if (missing(country))
{
warning(&quot;Country of league not specified.&quot;)
country = &quot;&quot;
}
&nbsp;
if (missing(divname))
{
warning(&quot;Name of league division not specified.&quot;)
divname = &quot;&quot;
}
&nbsp;
## Import Results
&nbsp;
tmpResults = read.csv(datafile)[,c(&quot;Date&quot;,&quot;HomeTeam&quot;,&quot;AwayTeam&quot;,&quot;FTR&quot;,&quot;FTHG&quot;,&quot;FTAG&quot;)]
&nbsp;
if (missing(teams))
{
warning(&quot;Team names not specified - extracted from results data.&quot;)
teams = sort(unique(c(as.character(tmpResults$HomeTeam), as.character(tmpResults$AwayTeam))))
}
&nbsp;
tmpResults$HomeTeam = factor(tmpResults$HomeTeam, levels = teams)
tmpResults$AwayTeam = factor(tmpResults$AwayTeam, levels = teams)
&nbsp;
## Return Division Information
&nbsp;
tmpSummary = list(Country = country, Division = divname, Season = season, Teams = teams,
Results = tmpResults)
&nbsp;
invisible(tmpSummary)
}</pre></div></div>

<p>We then test this function with the data file shown above. First up we create our own list of teams in the English Premiership for 2010/2011 and specify some of the other function arguments while using the defaults for points.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; E0teams.1011 = c(&quot;Arsenal&quot;, &quot;Aston Villa&quot;, &quot;Birmingham&quot;, &quot;Blackburn&quot;,
+ &quot;Blackpool&quot;, &quot;Bolton&quot;, &quot;Chelsea&quot;, &quot;Everton&quot;, &quot;Fulham&quot;, &quot;Liverpool&quot;,
+ &quot;Man City&quot;, &quot;Man United&quot;, &quot;Newcastle&quot;, &quot;Stoke&quot;, &quot;Sunderland&quot;,
+ &quot;Tottenham&quot;, &quot;West Brom&quot;, &quot;West Ham&quot;, &quot;Wigan&quot;, &quot;Wolves&quot;)
&gt; print(football.process.v1(&quot;E0test.csv&quot;, &quot;England&quot;, &quot;Premiership&quot;,
    &quot;2010-2011&quot;, E0teams.1011))
$Country
[1] &quot;England&quot;
&nbsp;
$Division
[1] &quot;Premiership&quot;
&nbsp;
$Season
[1] &quot;2010-2011&quot;
&nbsp;
$Teams
 [1] &quot;Arsenal&quot;     &quot;Aston Villa&quot; &quot;Birmingham&quot;  &quot;Blackburn&quot;   &quot;Blackpool&quot;  
 [6] &quot;Bolton&quot;      &quot;Chelsea&quot;     &quot;Everton&quot;     &quot;Fulham&quot;      &quot;Liverpool&quot;  
[11] &quot;Man City&quot;    &quot;Man United&quot;  &quot;Newcastle&quot;   &quot;Stoke&quot;       &quot;Sunderland&quot; 
[16] &quot;Tottenham&quot;   &quot;West Brom&quot;   &quot;West Ham&quot;    &quot;Wigan&quot;       &quot;Wolves&quot;     
&nbsp;
$Results
        Date    HomeTeam    AwayTeam FTR FTHG FTAG
1  14/8/2010 Aston Villa    West Ham   H    3    0
2  14/8/2010   Blackburn     Everton   H    1    0
3  14/8/2010      Bolton      Fulham   D    0    0
4  14/8/2010     Chelsea   West Brom   H    6    0
5  14/8/2010  Sunderland  Birmingham   D    2    2
6  14/8/2010   Tottenham    Man City   D    0    0
7  14/8/2010       Wigan   Blackpool   A    0    4
8  14/8/2010      Wolves       Stoke   H    2    1
9  15/8/2010   Liverpool     Arsenal   D    1    1
10 16/8/2010  Man United   Newcastle   H    3    0
11 21/8/2010     Arsenal   Blackpool   H    6    0
12 21/8/2010  Birmingham   Blackburn   H    2    1
13 21/8/2010     Everton      Wolves   D    1    1
14 21/8/2010       Stoke   Tottenham   A    1    2
15 21/8/2010   West Brom  Sunderland   H    1    0
16 21/8/2010    West Ham      Bolton   A    1    3
17 21/8/2010       Wigan     Chelsea   A    0    6
18 22/8/2010      Fulham  Man United   D    2    2
19 22/8/2010   Newcastle Aston Villa   H    6    0
20 23/8/2010    Man City   Liverpool   H    3    0
21 28/8/2010   Blackburn     Arsenal   A    1    2
22 28/8/2010   Blackpool      Fulham   D    2    2
23 28/8/2010     Chelsea       Stoke   H    2    0
24 28/8/2010  Man United    West Ham   H    3    0
25 28/8/2010   Tottenham       Wigan   A    0    1
26 28/8/2010      Wolves   Newcastle   D    1    1
27 29/8/2010 Aston Villa     Everton   H    1    0
28 29/8/2010      Bolton  Birmingham   D    2    2
29 29/8/2010   Liverpool   West Brom   H    1    0
30 29/8/2010  Sunderland    Man City   H    1    0</pre></div></div>

<p>Other useful resources are provided on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/programming-with-r-processing-football-league-data-part-i/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Importing Data from other Statistical Software Packages</title>
		<link>http://www.wekaleamstudios.co.uk/posts/importing-data-from-other-statistical-software-packages/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/importing-data-from-other-statistical-software-packages/#comments</comments>
		<pubDate>Fri, 01 May 2009 14:43:23 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[File Import/Export]]></category>
		<category><![CDATA[data frame]]></category>
		<category><![CDATA[dta]]></category>
		<category><![CDATA[file export]]></category>
		<category><![CDATA[file import]]></category>
		<category><![CDATA[foreign]]></category>
		<category><![CDATA[Minitab]]></category>
		<category><![CDATA[Stata]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=70</guid>
		<description><![CDATA[There are a large of number of software packages that are available for data analysts and the foreign package in R has functions defined to read data from some of the most commonly used packages that have their own proprietary data format. For other packages the user can often export data to a delimited text [...]]]></description>
			<content:encoded><![CDATA[<p>There are a large of number of software packages that are available for data analysts and the <strong>foreign</strong> package in <strong>R</strong> has functions defined to read data from some of the most commonly used packages that have their own proprietary data format. For other packages the user can often export data to a delimited text file which can then be handled easily by <strong>R</strong>.<span id="more-70"></span></p>
<p>To make use of the <strong>foreign</strong> package we first need to attach it to our session in the usual way for <strong>R</strong> packages:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">library(foreign)</pre></div></div>

<p><strong>Stata</strong></p>
<p>The <strong>Stata</strong> software package has a binary format for representing data and the file name has an extension of <strong>dta</strong>. There is a function called <strong>read.dta</strong> in the <strong>foreign</strong> library that can be used to import data from these binary files. As an example if we wanted to load some data we could run the following code:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">read.dta(&quot;Data\\reactor.dta&quot;)</pre></div></div>

<p>This function call is very straightforward and we would save the output to a data frame of our choice so that it can be used for the subsequent analysis.</p>
<p><strong>Minitab</strong></p>
<p>Another popular statistical software package is <strong>Minitab</strong> and we can import data saved in the <strong>Minitab Portable Worksheet</strong> format. If the data in the example above had been exported from Minitab then the function call to load the data would be:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">read.mtp(&quot;Data\\reactor.mtp&quot;)</pre></div></div>

<p><strong>SPSS</strong></p>
<p>Another frequently used package is <strong>SPSS</strong> and as with the other packages there is a function that can be used to import data from an <strong>sav</strong> file. The syntax for this operation is very similar to the functions for importing data from other systems:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">read.spss(&quot;Data\\test-results-data.sav&quot;)</pre></div></div>

<p>There are other file formats that can be handled by the <strong>foreign</strong> library but these will not be considered in this post.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/importing-data-from-other-statistical-software-packages/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Exporting Data from R to Text Files</title>
		<link>http://www.wekaleamstudios.co.uk/posts/exporting-data-from-r-to-text-files/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/exporting-data-from-r-to-text-files/#comments</comments>
		<pubDate>Mon, 27 Apr 2009 21:16:57 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[File Import/Export]]></category>
		<category><![CDATA[csv]]></category>
		<category><![CDATA[delimited]]></category>
		<category><![CDATA[file export]]></category>
		<category><![CDATA[text]]></category>
		<category><![CDATA[write.csv]]></category>
		<category><![CDATA[write.table]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=55</guid>
		<description><![CDATA[Exporting small or medium sized data sets from the R environment to text files is a straightforward task. The two functions that are most useful for this operation are write.csv and write.table which export data to comma separate variable format or a text format with a different character used to indicate separate columns of data. [...]]]></description>
			<content:encoded><![CDATA[<p>Exporting small or medium sized data sets from the <strong>R</strong> environment to text files is a straightforward task. The two functions that are most useful for this operation are <strong>write.csv</strong> and <strong>write.table</strong> which export data to comma separate variable format or a text format with a different character used to indicate separate columns of data.<span id="more-55"></span></p>
<p>If we had a data frame in <strong>R</strong>, for example with a name <strong>sales.data</strong>, then the basic function call to export this data to a csv file would be:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">write.csv(sales.data, &quot;SalesData.csv&quot;)</pre></div></div>

<p>Data in <strong>R</strong> can have separate row names and the default option for <strong>write.csv</strong> is to include row names. If there are no row names then we end up with a redundant column of sequential numbers. As an example we might have:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&quot;&quot;,&quot;Month&quot;,&quot;Year&quot;,&quot;Units Sold&quot;
&quot;1&quot;,&quot;Jan&quot;,&quot;2009&quot;,12500
&quot;2&quot;,&quot;Feb&quot;,&quot;2009&quot;,11750
...</pre></div></div>

<p>The function can be instructed to ignore the row names by supplying an additional argument, so the function call would&#8217;ve been:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">write.csv(sales.data, &quot;SalesData.csv&quot;, row.names = FALSE)</pre></div></div>

<p>The text file would now read:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&quot;Month&quot;,&quot;Year&quot;,&quot;Units Sold&quot;
&quot;Jan&quot;,&quot;2009&quot;,12500
&quot;Feb&quot;,&quot;2009&quot;,11750
...</pre></div></div>

<p>The comma separated variable format is not the only text file format that can be created using <strong>R</strong> and the function <strong>write.table</strong> can be used if a variation is required. The function call is very similar to the one used above and if we wanted to export the anscombe data set that is available within R then this code could be used:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">write.table(anscombe, &quot;Data\\anscombe.txt&quot;, row.names = FALSE)</pre></div></div>

<p>This would create a text file like this:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&quot;x1&quot; &quot;x2&quot; &quot;x3&quot; &quot;x4&quot; &quot;y1&quot; &quot;y2&quot; &quot;y3&quot; &quot;y4&quot;
10 10 10 8 8.04 9.14 7.46 6.58
8 8 8 8 6.95 8.14 6.77 5.76
...</pre></div></div>

<p>so we can see that the default option is to use a space to separate the columns of data in the output file. The character used to separate the columns can be specified with the <strong>sep</strong> argument to this function. An example would be:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">write.table(Orange, &quot;Data\\orange.txt&quot;, sep = &quot;\t&quot;, row.names = FALSE)</pre></div></div>

<p>to give the following output:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&quot;Tree&quot;	&quot;age&quot;	&quot;circumference&quot;
&quot;1&quot;	118	30
&quot;1&quot;	484	58
...</pre></div></div>

<p>Other options for these functions are detailed in the respective help pages.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/exporting-data-from-r-to-text-files/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Importing Data into R from Text Files</title>
		<link>http://www.wekaleamstudios.co.uk/posts/importing-data-into-r-from-text-files/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/importing-data-into-r-from-text-files/#comments</comments>
		<pubDate>Sat, 25 Apr 2009 17:00:48 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[File Import/Export]]></category>
		<category><![CDATA[csv]]></category>
		<category><![CDATA[delimited]]></category>
		<category><![CDATA[file export]]></category>
		<category><![CDATA[file import]]></category>
		<category><![CDATA[read.csv]]></category>
		<category><![CDATA[read.table]]></category>
		<category><![CDATA[text]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=3</guid>
		<description><![CDATA[The task of reading data into a statistical software package is not always a straight forward task and there are many varied file formats that are in use by different software systems. Text files are popular for sharing small or medium sized data sets, while full blown relational databases are more appropriate for larger data [...]]]></description>
			<content:encoded><![CDATA[<p>The task of reading data into a statistical software package is not always a straight forward task and there are many varied file formats that are in use by different software systems. Text files are popular for sharing small or medium sized data sets, while full blown relational databases are more appropriate for larger data sets. The R Environment has functions that handle importing data that is stored in text format and it is also possible to interact with external database systems.<span id="more-3"></span></p>
<p><!--[Fast Tube]--><span id="zNYabgcv_KY" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/importing-data-into-r-from-text-files/#zNYabgcv_KY"><img src="http://i.ytimg.com/vi/zNYabgcv_KY/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>When working with text files for storing data there are a number of common issues that need to be considered. A special character is used to distinguish between the columns of data, e.g. a comma or a tab. There is an optional first line that provides the names of the columns (variables) or the column names can be specified explicitly when importing the data. Missing values often cause problems when handling data and a special character can be specified to indicate missing data.</p>
<p>The comma separated variable text file format is straightforward to handle using R with the function <strong>read.csv</strong> where we specify the file name as our source of data. A simple example of using this function would be:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">read.csv(&quot;Data\\exampledata1.csv&quot;)</pre></div></div>

<p>The data is loaded and assuming there are no errors converted into a data frame that can be saved and subsequently analysed. To save the data as an object we could have run this code:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">data1 = read.csv(&quot;Data\\exampledata1.csv&quot;)</pre></div></div>

<p>This function makes use of the more general purpose function <strong>read.table</strong> which accepts a wider range of options to define the delimited text file that is imported into the R environment.</p>
<p>The first argument supplied to this function is also the file name and, if no other options are specified, the default is assumption is that a tab separates data in different columns and that the first line of the text file does not contain column name information. The <strong>header</strong> argument to this function can be set to <strong>TRUE</strong> to use the information in the first row as column names. An example of specifying this option is:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">read.table(&quot;Data\\exampledata2.txt&quot;, header = TRUE)</pre></div></div>

<p>If the data file does not have a row of column headings then the <strong>col.names</strong> argument can be used to specify the names that should be given to these columns. An example of using this option:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">read.table(&quot;Data\\exampledata3.txt&quot;, col.names = c(&quot;Weight&quot;, &quot;Group&quot;))</pre></div></div>

<p>The special character used to separate the data on each row can also be specified by the user via the <strong>sep</strong> argument to the function. An example of importing a data file where a semi-colon is used is shown here:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">read.table(&quot;Data\\exampledata4.txt&quot;, header = TRUE, sep = &quot;;&quot;)</pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/importing-data-into-r-from-text-files/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

