<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Software for Exploratory Data Analysis and Statistical Modelling &#187; S Programming</title>
	<atom:link href="http://www.wekaleamstudios.co.uk/topics/r-environment/s-programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.wekaleamstudios.co.uk</link>
	<description>Statistical Modelling with R</description>
	<lastBuildDate>Wed, 01 Feb 2012 19:44:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Getting rid of white space at the beginning and end of a string</title>
		<link>http://www.wekaleamstudios.co.uk/posts/getting-rid-of-white-space-at-the-beginning-and-end-of-a-string/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/getting-rid-of-white-space-at-the-beginning-and-end-of-a-string/#comments</comments>
		<pubDate>Thu, 28 Jul 2011 11:01:19 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[S Programming]]></category>
		<category><![CDATA[stringr]]></category>
		<category><![CDATA[str_trim]]></category>
		<category><![CDATA[whitespace]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1660</guid>
		<description><![CDATA[There are situations where we are working with character strings extracted from various sources and it can be annoying when there is white space at the beginning and/or end of the strings. This whitespace can cause problems when attemping to sort, subset or various other common operations. The stringr package has a handy function str_trim [...]]]></description>
			<content:encoded><![CDATA[<p>There are situations where we are working with character strings extracted from various sources and it can be annoying when there is white space at the beginning and/or end of the strings. This whitespace can cause problems when attemping to sort, subset or various other common operations.<span id="more-1660"></span></p>
<p>The <strong>stringr</strong> package has a handy function <strong>str_trim</strong> (edited) that comes to the rescue and is straightforward to use. First up make sure that the package is available in the <strong>R</strong> session:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">require(stringr)</pre></div></div>

<p>Here is a basic example with a simple string:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; &quot;  This is an example of whitespace.  &quot;
[1] &quot;  This is an example of whitespace.  &quot;
&gt; str_trim(&quot;  This is an example of whitespace.  &quot;)
[1] &quot;This is an example of whitespace.&quot;</pre></div></div>

<p>As we can see this is very simple and is set up to work on a vector of character strings as well.</p>
<p>Other useful resources are provided on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page. Visit <a href="http://en.wikibooks.org/wiki/R_Programming/Text_Processing">here</a> for more examples of string manipulation.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/getting-rid-of-white-space-at-the-beginning-and-end-of-a-string/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Handling Errors Gracefully</title>
		<link>http://www.wekaleamstudios.co.uk/posts/handling-errors-gracefully/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/handling-errors-gracefully/#comments</comments>
		<pubDate>Fri, 27 May 2011 20:44:29 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[S Programming]]></category>
		<category><![CDATA[errors]]></category>
		<category><![CDATA[try]]></category>
		<category><![CDATA[tryCatch]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1635</guid>
		<description><![CDATA[In R functions sometimes produces warnings or errors. In the case of errors execution of a function or a series of commands can get halted when an error occurs, which can in some cases be frustrating especially if we want to continue our calculations. There are various functions available in R for dealing with errors [...]]]></description>
			<content:encoded><![CDATA[<p>In <strong>R</strong> functions sometimes produces warnings or errors. In the case of errors execution of a function or a series of commands can get halted when an error occurs, which can in some cases be frustrating especially if we want to continue our calculations.<span id="more-1635"></span></p>
<p>There are various functions available in <strong>R</strong> for dealing with errors and in this post we will consider some basic examples of how to make use of the <strong>try</strong> function.</p>
<p>To illustrate how to use <strong>try</strong> let&#8217;s look at what happens if we run code to print out a non-existent object in our workspace:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; print(d)
Error in print(d) : object 'd' not found</pre></div></div>

<p>Now if we try to run another command at the same time to print out a string then we would get the following output:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; print(d); print(&quot;Hi&quot;)
Error in print(d) : object 'd' not found</pre></div></div>

<p>Here execution of the code halts after the first command fails and generates an error. We can modify this code to call the <strong>try</strong> function with the <strong>print</strong> statement:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; try(print(d)); print(&quot;Hi&quot;)
Error in print(d) : object 'd' not found
[1] &quot;Hi&quot;</pre></div></div>

<p>The second statement is now evaluated even though an error occurs at the first statement. This is a trivial example and in other situations we might consider using the <strong>tryCatch</strong> function which has various arguments in addition to the expression to evaluate including a function to be called when an error occurs.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/handling-errors-gracefully/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>R Matrix Operations</title>
		<link>http://www.wekaleamstudios.co.uk/posts/r-matrix-operations/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/r-matrix-operations/#comments</comments>
		<pubDate>Sun, 27 Mar 2011 15:09:40 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[S Programming]]></category>
		<category><![CDATA[matrix]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1607</guid>
		<description><![CDATA[R can be used to perform various matrix calculations. This include functions for creating matrices (matrix), addition (+), multiplication (%*%) and inversion (solve). Fast Tube by Casper Other useful resources are provided on the Supplementary Material page.]]></description>
			<content:encoded><![CDATA[<p><strong>R</strong> can be used to perform various matrix calculations. This include functions for creating matrices (<strong>matrix</strong>), addition (<strong>+</strong>), multiplication (<strong>%*%</strong>) and inversion (<strong>solve</strong>).<span id="more-1607"></span></p>
<p><!--[Fast Tube]--><span id="fF9cV-Fi4wE" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/r-matrix-operations/#fF9cV-Fi4wE"><img src="http://i.ytimg.com/vi/fF9cV-Fi4wE/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>Other useful resources are provided on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/r-matrix-operations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Programming with R &#8211; Processing Football League Data Part II</title>
		<link>http://www.wekaleamstudios.co.uk/posts/programming-with-r-processing-football-league-data-part-ii/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/programming-with-r-processing-football-league-data-part-ii/#comments</comments>
		<pubDate>Fri, 03 Dec 2010 10:26:39 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Data Manipulation]]></category>
		<category><![CDATA[Data Summary]]></category>
		<category><![CDATA[File Import/Export]]></category>
		<category><![CDATA[S Programming]]></category>
		<category><![CDATA[as.numeric]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data frame]]></category>
		<category><![CDATA[England]]></category>
		<category><![CDATA[football]]></category>
		<category><![CDATA[ifelse]]></category>
		<category><![CDATA[Premiership]]></category>
		<category><![CDATA[results]]></category>
		<category><![CDATA[table]]></category>
		<category><![CDATA[tapply]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1459</guid>
		<description><![CDATA[Following on from the previous post about creating a football result processing function for data from the football-data.co.uk website we will add code to the function to generate a league table based on the results to date. To create the league table we need to count various things such as the number of games played, [...]]]></description>
			<content:encoded><![CDATA[<p>Following on from the previous <a href="http://www.wekaleamstudios.co.uk/posts/programming-with-r-processing-football-league-data-part-i/">post</a> about creating a football result processing function for data from the <a href="http://www.football-data.co.uk">football-data.co.uk</a> website we will add code to the function to generate a league table based on the results to date.<span id="more-1459"></span></p>
<p>To create the league table we need to count various things such as the number of games played, number of wins/draws/losses, goals scored etc. This information is available in the results object that is loaded from a <strong>csv</strong> file in the function as it stands.</p>
<p>To facilitate these calculations we create a data frame with a row for each team in the division and then calculate the statistics required &#8211; this was a reason for ordering the factors in the <strong>HomeTeam</strong> and <strong>AwayTeam</strong> columns of the results table. The data frame is created with the code below:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpTable = data.frame(Team = teams,
    Games = 0, Win = 0, Draw = 0, Loss = 0,
    HomeGames = 0, HomeWin = 0, HomeDraw = 0, HomeLoss = 0,
    AwayGames = 0, AwayWin = 0, AwayDraw = 0, AwayLoss = 0,
    Points = 0,
    HomeFor = 0, HomeAgainst = 0,
    AwayFor = 0, AwayAgainst = 0,
    For = 0, Against = 0, GoalDifference = 0)</pre></div></div>

<p>There are a number of slots that are may be redundant in a league table but are used for intermediate calculations, such as <strong>HomeWin</strong> and <strong>AwayWin</strong> that are combined to find the total number of victories for a team.</p>
<p>The number of games played by each team home and away are counted using the table command for the two columns respectively.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpTable$HomeGames = as.numeric(table(tmpResults$HomeTeam))
tmpTable$AwayGames = as.numeric(table(tmpResults$AwayTeam))</pre></div></div>

<p>The labels created by the table command are discarded using the as.numeric function to retain only the number of games. The table command is also used to count the number of wins, draws and losses at home and away for each team. The commands are shown here:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpTable$HomeWin =
    as.numeric(table(tmpResults$HomeTeam[tmpResults$FTR == &quot;H&quot;]))
tmpTable$HomeDraw =
    as.numeric(table(tmpResults$HomeTeam[tmpResults$FTR == &quot;D&quot;]))
tmpTable$HomeLoss =
    as.numeric(table(tmpResults$HomeTeam[tmpResults$FTR == &quot;A&quot;]))
&nbsp;
tmpTable$AwayWin =
    as.numeric(table(tmpResults$AwayTeam[tmpResults$FTR == &quot;A&quot;]))
tmpTable$AwayDraw =
    as.numeric(table(tmpResults$AwayTeam[tmpResults$FTR == &quot;D&quot;]))
tmpTable$AwayLoss =
    as.numeric(table(tmpResults$AwayTeam[tmpResults$FTR == &quot;H&quot;]))</pre></div></div>

<p>Note that we subset on the values in the <strong>FTR</strong> column, which is full-time result, and then count. The subsetting is reversed when looking at the away fixtures because a victory for the team is now an away win rather than a home win.</p>
<p>This information is then combined to get total games played, won etc.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpTable$Games = tmpTable$HomeGames + tmpTable$AwayGames
tmpTable$Win = tmpTable$HomeWin + tmpTable$AwayWin
tmpTable$Draw = tmpTable$HomeDraw + tmpTable$AwayDraw
tmpTable$Loss = tmpTable$HomeLoss + tmpTable$AwayLoss</pre></div></div>

<p>The total points is calclated by multiplying the number of wins, draws and losses by the number of points awarded for each match outcome.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpTable$Points = winPoints * tmpTable$Win +
    drawPoints * tmpTable$Draw + lossPoints * tmpTable$Loss</pre></div></div>

<p>The next set of calculations are to count the number of goals scored, goals conceeded and goal difference. The <strong>tapply</strong> function is used for these calculations.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpTable$HomeFor =
    as.numeric(tapply(tmpResults$FTHG, tmpResults$HomeTeam, sum, na.rm = TRUE))
tmpTable$HomeAgainst =
    as.numeric(tapply(tmpResults$FTAG, tmpResults$HomeTeam, sum, na.rm = TRUE))
&nbsp;
tmpTable$AwayFor =
    as.numeric(tapply(tmpResults$FTAG, tmpResults$AwayTeam, sum, na.rm = TRUE))
tmpTable$AwayAgainst =
    as.numeric(tapply(tmpResults$FTHG, tmpResults$AwayTeam, sum, na.rm = TRUE))</pre></div></div>

<p>The <strong>tapply</strong> function applies the <strong>sum</strong> to the number of goals scored at home or away, and the number of goals conceeded by each team in the division. These are then combined to create totals home and away:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpTable$For =
    ifelse(is.na(tmpTable$HomeFor), 0, tmpTable$HomeFor) +
    ifelse(is.na(tmpTable$AwayFor), 0, tmpTable$AwayFor)
tmpTable$Against =
    ifelse(is.na(tmpTable$HomeAgainst), 0, tmpTable$HomeAgainst) +
    ifelse(is.na(tmpTable$AwayAgainst), 0, tmpTable$AwayAgainst)</pre></div></div>

<p>The <strong>ifelse</strong> statement is used to handle situations where a team hasn&#8217;t played a home and/or away fixture yet. The goal difference is easy to calculate:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpTable$GoalDifference = tmpTable$For - tmpTable$Against</pre></div></div>

<p>Now that all of the statistics have been calculated we sort the table based on the number of points, goal difference and finally alphabetically. There might be different ways that we can order the teams but this is what we will use for the time being:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpTable =
  tmpTable[order(- tmpTable$Points, - tmpTable$GoalDifference, tmpTable$Team),]</pre></div></div>

<p>The ordering might look odd but we want to ranking from highest to lowest points and goal difference but then in ascending alphabetical order for the teams.</p>
<p>The whole function is now:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">football.process.v2 = function(datafile, country, divname, season, teams, winPoints = 3, drawPoints = 1, lossPoints = 0)
{
## Validation Function Arguments
&nbsp;
if (missing(datafile))
{
stop(&quot;Results csv file not specified.&quot;)
}
&nbsp;
if (missing(country))
{
warning(&quot;Country of league not specified.&quot;)
country = &quot;&quot;
}
&nbsp;
if (missing(divname))
{
warning(&quot;Name of league division not specified.&quot;)
divname = &quot;&quot;
}
&nbsp;
## Import Results
&nbsp;
tmpResults = read.csv(datafile)[,c(&quot;Date&quot;,&quot;HomeTeam&quot;,&quot;AwayTeam&quot;,&quot;FTR&quot;,&quot;FTHG&quot;,&quot;FTAG&quot;)]
&nbsp;
if (missing(teams))
{
warning(&quot;Team names not specified - extracted from results data.&quot;)
teams = sort(unique(c(as.character(tmpResults$HomeTeam), as.character(tmpResults$AwayTeam))))
}
&nbsp;
tmpResults$HomeTeam = factor(tmpResults$HomeTeam, levels = teams)
tmpResults$AwayTeam = factor(tmpResults$AwayTeam, levels = teams)
&nbsp;
## Create Empty League Table
&nbsp;
tmpTable = data.frame(Team = teams,
Games = 0, Win = 0, Draw = 0, Loss = 0,
HomeGames = 0, HomeWin = 0, HomeDraw = 0, HomeLoss = 0,
AwayGames = 0, AwayWin = 0, AwayDraw = 0, AwayLoss = 0,
Points = 0,
HomeFor = 0, HomeAgainst = 0,
AwayFor = 0, AwayAgainst = 0,
For = 0, Against = 0, GoalDifference = 0)
&nbsp;
## Count Number of Games Played
&nbsp;
tmpTable$HomeGames = as.numeric(table(tmpResults$HomeTeam))
tmpTable$AwayGames = as.numeric(table(tmpResults$AwayTeam))
&nbsp;
## Count Number of Wins/Draws/Losses
&nbsp;
tmpTable$HomeWin = as.numeric(table(tmpResults$HomeTeam[tmpResults$FTR == &quot;H&quot;]))
tmpTable$HomeDraw = as.numeric(table(tmpResults$HomeTeam[tmpResults$FTR == &quot;D&quot;]))
tmpTable$HomeLoss = as.numeric(table(tmpResults$HomeTeam[tmpResults$FTR == &quot;A&quot;]))
&nbsp;
tmpTable$AwayWin = as.numeric(table(tmpResults$AwayTeam[tmpResults$FTR == &quot;A&quot;]))
tmpTable$AwayDraw = as.numeric(table(tmpResults$AwayTeam[tmpResults$FTR == &quot;D&quot;]))
tmpTable$AwayLoss = as.numeric(table(tmpResults$AwayTeam[tmpResults$FTR == &quot;H&quot;]))
&nbsp;
tmpTable$Games = tmpTable$HomeGames + tmpTable$AwayGames
tmpTable$Win = tmpTable$HomeWin + tmpTable$AwayWin
tmpTable$Draw = tmpTable$HomeDraw + tmpTable$AwayDraw
tmpTable$Loss = tmpTable$HomeLoss + tmpTable$AwayLoss
tmpTable$Points = winPoints * tmpTable$Win + drawPoints * tmpTable$Draw + lossPoints * tmpTable$Loss
&nbsp;
## Count Goals Scored and Conceeded
&nbsp;
tmpTable$HomeFor = as.numeric(tapply(tmpResults$FTHG, tmpResults$HomeTeam, sum, na.rm = TRUE))
tmpTable$HomeAgainst = as.numeric(tapply(tmpResults$FTAG, tmpResults$HomeTeam, sum, na.rm = TRUE))
&nbsp;
tmpTable$AwayFor = as.numeric(tapply(tmpResults$FTAG, tmpResults$AwayTeam, sum, na.rm = TRUE))
tmpTable$AwayAgainst = as.numeric(tapply(tmpResults$FTHG, tmpResults$AwayTeam, sum, na.rm = TRUE))
&nbsp;
tmpTable$For = ifelse(is.na(tmpTable$HomeFor), 0, tmpTable$HomeFor) +
ifelse(is.na(tmpTable$AwayFor), 0, tmpTable$AwayFor)
tmpTable$Against = ifelse(is.na(tmpTable$HomeAgainst), 0, tmpTable$HomeAgainst) +
ifelse(is.na(tmpTable$AwayAgainst), 0, tmpTable$AwayAgainst)
&nbsp;
tmpTable$GoalDifference = tmpTable$For - tmpTable$Against
&nbsp;
## Sort Table
## By Points
## By Goal Difference
## By Team Name (Alphabetical)
&nbsp;
tmpTable = tmpTable[order(- tmpTable$Points, - tmpTable$GoalDifference, tmpTable$Team),]
&nbsp;
tmpTable = tmpTable[,c(&quot;Team&quot;, &quot;Games&quot;, &quot;Win&quot;, &quot;Draw&quot;, &quot;Loss&quot;, &quot;Points&quot;, &quot;For&quot;, &quot;Against&quot;, &quot;GoalDifference&quot;)]
&nbsp;
## Return Division Information
&nbsp;
tmpSummary = list(Country = country, Division = divname, Season = season, Teams = teams,
Results = tmpResults, Table = tmpTable)
&nbsp;
invisible(tmpSummary)
}</pre></div></div>

<p>There are other functionality that we might want to add to the function.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/programming-with-r-processing-football-league-data-part-ii/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Programming with R &#8211; Processing Football League Data Part I</title>
		<link>http://www.wekaleamstudios.co.uk/posts/programming-with-r-processing-football-league-data-part-i/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/programming-with-r-processing-football-league-data-part-i/#comments</comments>
		<pubDate>Tue, 23 Nov 2010 14:14:45 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Data Manipulation]]></category>
		<category><![CDATA[Data Summary]]></category>
		<category><![CDATA[File Import/Export]]></category>
		<category><![CDATA[S Programming]]></category>
		<category><![CDATA[csv]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[England]]></category>
		<category><![CDATA[football]]></category>
		<category><![CDATA[list]]></category>
		<category><![CDATA[Premiership]]></category>
		<category><![CDATA[print]]></category>
		<category><![CDATA[read.csv]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1447</guid>
		<description><![CDATA[In this post we will make use of football results data from the football-data.co.uk website to demonstrate creating functions in R to automate a series of standard operations that would be required for results data from various leagues and divisions. The first step is to consider what control options should be available as part of [...]]]></description>
			<content:encoded><![CDATA[<p>In this post we will make use of football results data from the <a href="http://www.football-data.co.uk">football-data.co.uk</a> website to demonstrate creating functions in <strong>R</strong> to automate a series of standard operations that would be required for results data from various leagues and divisions.<span id="more-1447"></span></p>
<p>The first step is to consider what control options should be available as part of the function and here is a list of some arguments that will be used for this implementation of a football result data processing function:</p>
<ul>
<li>The name of a <strong>csv</strong> data file from the <a href="http://www.football-data.co.uk">football-data.co.uk</a> website.</li>
<li>A text string to specify the country and division for the data.</li>
<li>A text string specifying the season.</li>
<li>A list of teams in the division (optional), which could be used to test for data entry errors in the data file.</li>
<li>The number of points for a win, draw or loss. This might seem a strange option initially but different leagues might award different points for the three outcomes.</li>
</ul>
<p>Some of this information might appear optional but is included so that we can write a custom <strong>print</strong> function at a later date to display a meaningful summary of the object (list) that will be created by the function.</p>
<p>The first part of our function is concerned with checking the various values provided to the function arguments. Our skeleton function is as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">football.process.v1 = function(datafile, country, divname, season,
  teams, winPoints = 3, drawPoints = 1, lossPoints = 0)
{
&nbsp;
}</pre></div></div>

<p>Here we have specified default options for three of the arguments with the most likely number of points for each match outcome, i.e. 3 points for a win and 1 point for a draw.</p>
<p>To illustrate the working of the result processing function we will use a small exert from the start of the 2010/2011 English Premiership season which is shown below:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">Div,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,Referee
E0,14/8/2010,Aston Villa,West Ham,3,0,H,2,0,H,M Dean
E0,14/8/2010,Blackburn,Everton,1,0,H,1,0,H,P Dowd
E0,14/8/2010,Bolton,Fulham,0,0,D,0,0,D,S Attwell
E0,14/8/2010,Chelsea,West Brom,6,0,H,2,0,H,M Clattenburg
E0,14/8/2010,Sunderland,Birmingham,2,2,D,1,0,H,A Taylor
E0,14/8/2010,Tottenham,Man City,0,0,D,0,0,D,A Marriner
E0,14/8/2010,Wigan,Blackpool,0,4,A,0,3,A,M Halsey
E0,14/8/2010,Wolves,Stoke,2,1,H,2,0,H,L Probert
E0,15/8/2010,Liverpool,Arsenal,1,1,D,0,0,D,M Atkinson
E0,16/8/2010,Man United,Newcastle,3,0,H,2,0,H,C Foy
E0,21/8/2010,Arsenal,Blackpool,6,0,H,3,0,H,M Jones
E0,21/8/2010,Birmingham,Blackburn,2,1,H,0,0,D,M Oliver
E0,21/8/2010,Everton,Wolves,1,1,D,1,0,H,L Mason
E0,21/8/2010,Stoke,Tottenham,1,2,A,1,2,A,C Foy
E0,21/8/2010,West Brom,Sunderland,1,0,H,0,0,D,K Friend
E0,21/8/2010,West Ham,Bolton,1,3,A,0,0,D,A Marriner
E0,21/8/2010,Wigan,Chelsea,0,6,A,0,1,A,M Dean
E0,22/8/2010,Fulham,Man United,2,2,D,0,1,A,P Walton
E0,22/8/2010,Newcastle,Aston Villa,6,0,H,3,0,H,M Atkinson
E0,23/8/2010,Man City,Liverpool,3,0,H,1,0,H,P Dowd
E0,28/8/2010,Blackburn,Arsenal,1,2,A,1,1,D,C Foy
E0,28/8/2010,Blackpool,Fulham,2,2,D,0,1,A,M Oliver
E0,28/8/2010,Chelsea,Stoke,2,0,H,1,0,H,M Atkinson
E0,28/8/2010,Man United,West Ham,3,0,H,1,0,H,M Clattenburg
E0,28/8/2010,Tottenham,Wigan,0,1,A,0,0,D,P Dowd
E0,28/8/2010,Wolves,Newcastle,1,1,D,1,0,H,S Attwell
E0,29/8/2010,Aston Villa,Everton,1,0,H,1,0,H,M Jones
E0,29/8/2010,Bolton,Birmingham,2,2,D,0,1,A,K Friend
E0,29/8/2010,Liverpool,West Brom,1,0,H,0,0,D,L Probert
E0,29/8/2010,Sunderland,Man City,1,0,H,0,0,D,M Dean</pre></div></div>

<p>This is stored in a file <strong>E0test.csv</strong> so that we can use the <strong>read.csv</strong> function to import the results data and then process it.</p>
<p>The first series of commands that we add to the function are for checking various function arguments specified by the user to ensure that they are sensible. First up we check whether a results data file has been specified as we cannot do any processing without any results. The simple test is for whether a file name has been specified:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">if (missing(datafile))
{
    stop(&quot;Results csv file not specified.&quot;)
}</pre></div></div>

<p>It might be sensible to check whether the object <strong>datafile</strong> is actually a character string specifying a file, but this hasn&#8217;t been done for now. We then check whether the country name and division have been specified and set them to blank strings if they haven&#8217;t been set by the user.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">if (missing(country))
{
    warning(&quot;Country of league not specified.&quot;)
    country = &quot;&quot;
}
&nbsp;
if (missing(divname))
{
    warning(&quot;Name of league division not specified.&quot;)
    divname = &quot;&quot;
}</pre></div></div>

<p>Next up we import the data file and only save the columns of interest (at this point of the development of the function at least. There are many more columns of information that we need in the raw data from the website,</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpResults =
    read.csv(datafile)[,c(&quot;Date&quot;,&quot;HomeTeam&quot;,&quot;AwayTeam&quot;,&quot;FTR&quot;,&quot;FTHG&quot;,&quot;FTAG&quot;)]</pre></div></div>

<p>The square brackets are used to subset on a part set of columns and only save these. Then we check whether the team names have been specified by the user and if not extract them from the data provided:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">if (missing(teams))
{
    warning(&quot;Team names not specified - extracted from results data.&quot;)
    teams = sort(unique(c(as.character(tmpResults$HomeTeam),
        as.character(tmpResults$AwayTeam))))
}</pre></div></div>

<p>The sort function is used to order the team names alphabetically which is the order often used in league tables, especially when no games have been played. We then convert the columns <strong>HomeTeam</strong> and <strong>AwayTeam</strong> into factors, which allows teams that haven&#8217;t played a fixture yet to be included in the table.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpResults$HomeTeam = factor(tmpResults$HomeTeam, levels = teams)
tmpResults$AwayTeam = factor(tmpResults$AwayTeam, levels = teams)</pre></div></div>

<p>To round off the first part of creating the result processing function we create a list object to return at the end of the function.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tmpSummary = list(Country = country, Division = divname,
    Season = season, Teams = teams, Results = tmpResults)</pre></div></div>

<p>The function so far:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">football.process.v1 = function(datafile, country, divname, season, teams, winPoints = 3, drawPoints = 1, lossPoints = 0)
{
## Validation Function Arguments
&nbsp;
if (missing(datafile))
{
stop(&quot;Results csv file not specified.&quot;)
}
&nbsp;
if (missing(country))
{
warning(&quot;Country of league not specified.&quot;)
country = &quot;&quot;
}
&nbsp;
if (missing(divname))
{
warning(&quot;Name of league division not specified.&quot;)
divname = &quot;&quot;
}
&nbsp;
## Import Results
&nbsp;
tmpResults = read.csv(datafile)[,c(&quot;Date&quot;,&quot;HomeTeam&quot;,&quot;AwayTeam&quot;,&quot;FTR&quot;,&quot;FTHG&quot;,&quot;FTAG&quot;)]
&nbsp;
if (missing(teams))
{
warning(&quot;Team names not specified - extracted from results data.&quot;)
teams = sort(unique(c(as.character(tmpResults$HomeTeam), as.character(tmpResults$AwayTeam))))
}
&nbsp;
tmpResults$HomeTeam = factor(tmpResults$HomeTeam, levels = teams)
tmpResults$AwayTeam = factor(tmpResults$AwayTeam, levels = teams)
&nbsp;
## Return Division Information
&nbsp;
tmpSummary = list(Country = country, Division = divname, Season = season, Teams = teams,
Results = tmpResults)
&nbsp;
invisible(tmpSummary)
}</pre></div></div>

<p>We then test this function with the data file shown above. First up we create our own list of teams in the English Premiership for 2010/2011 and specify some of the other function arguments while using the defaults for points.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; E0teams.1011 = c(&quot;Arsenal&quot;, &quot;Aston Villa&quot;, &quot;Birmingham&quot;, &quot;Blackburn&quot;,
+ &quot;Blackpool&quot;, &quot;Bolton&quot;, &quot;Chelsea&quot;, &quot;Everton&quot;, &quot;Fulham&quot;, &quot;Liverpool&quot;,
+ &quot;Man City&quot;, &quot;Man United&quot;, &quot;Newcastle&quot;, &quot;Stoke&quot;, &quot;Sunderland&quot;,
+ &quot;Tottenham&quot;, &quot;West Brom&quot;, &quot;West Ham&quot;, &quot;Wigan&quot;, &quot;Wolves&quot;)
&gt; print(football.process.v1(&quot;E0test.csv&quot;, &quot;England&quot;, &quot;Premiership&quot;,
    &quot;2010-2011&quot;, E0teams.1011))
$Country
[1] &quot;England&quot;
&nbsp;
$Division
[1] &quot;Premiership&quot;
&nbsp;
$Season
[1] &quot;2010-2011&quot;
&nbsp;
$Teams
 [1] &quot;Arsenal&quot;     &quot;Aston Villa&quot; &quot;Birmingham&quot;  &quot;Blackburn&quot;   &quot;Blackpool&quot;  
 [6] &quot;Bolton&quot;      &quot;Chelsea&quot;     &quot;Everton&quot;     &quot;Fulham&quot;      &quot;Liverpool&quot;  
[11] &quot;Man City&quot;    &quot;Man United&quot;  &quot;Newcastle&quot;   &quot;Stoke&quot;       &quot;Sunderland&quot; 
[16] &quot;Tottenham&quot;   &quot;West Brom&quot;   &quot;West Ham&quot;    &quot;Wigan&quot;       &quot;Wolves&quot;     
&nbsp;
$Results
        Date    HomeTeam    AwayTeam FTR FTHG FTAG
1  14/8/2010 Aston Villa    West Ham   H    3    0
2  14/8/2010   Blackburn     Everton   H    1    0
3  14/8/2010      Bolton      Fulham   D    0    0
4  14/8/2010     Chelsea   West Brom   H    6    0
5  14/8/2010  Sunderland  Birmingham   D    2    2
6  14/8/2010   Tottenham    Man City   D    0    0
7  14/8/2010       Wigan   Blackpool   A    0    4
8  14/8/2010      Wolves       Stoke   H    2    1
9  15/8/2010   Liverpool     Arsenal   D    1    1
10 16/8/2010  Man United   Newcastle   H    3    0
11 21/8/2010     Arsenal   Blackpool   H    6    0
12 21/8/2010  Birmingham   Blackburn   H    2    1
13 21/8/2010     Everton      Wolves   D    1    1
14 21/8/2010       Stoke   Tottenham   A    1    2
15 21/8/2010   West Brom  Sunderland   H    1    0
16 21/8/2010    West Ham      Bolton   A    1    3
17 21/8/2010       Wigan     Chelsea   A    0    6
18 22/8/2010      Fulham  Man United   D    2    2
19 22/8/2010   Newcastle Aston Villa   H    6    0
20 23/8/2010    Man City   Liverpool   H    3    0
21 28/8/2010   Blackburn     Arsenal   A    1    2
22 28/8/2010   Blackpool      Fulham   D    2    2
23 28/8/2010     Chelsea       Stoke   H    2    0
24 28/8/2010  Man United    West Ham   H    3    0
25 28/8/2010   Tottenham       Wigan   A    0    1
26 28/8/2010      Wolves   Newcastle   D    1    1
27 29/8/2010 Aston Villa     Everton   H    1    0
28 29/8/2010      Bolton  Birmingham   D    2    2
29 29/8/2010   Liverpool   West Brom   H    1    0
30 29/8/2010  Sunderland    Man City   H    1    0</pre></div></div>

<p>Other useful resources are provided on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/programming-with-r-processing-football-league-data-part-i/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Programming with R &#8211; Checking Data Types</title>
		<link>http://www.wekaleamstudios.co.uk/posts/programming-with-r-checking-data-types/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/programming-with-r-checking-data-types/#comments</comments>
		<pubDate>Sat, 13 Nov 2010 13:42:47 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[S Programming]]></category>
		<category><![CDATA[argument]]></category>
		<category><![CDATA[function]]></category>
		<category><![CDATA[is.numeric]]></category>
		<category><![CDATA[validation]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1451</guid>
		<description><![CDATA[There are a number of useful functions in R that test the variable type or convert between different variable types. These can be used to validate function input to ensure that sensible answers are returned from a function or to ensure that the function doesn&#8217;t fail. Following on from a previous post on a simple [...]]]></description>
			<content:encoded><![CDATA[<p>There are a number of useful functions in <strong>R</strong> that test the variable type or convert between different variable types. These can be used to validate function input to ensure that sensible answers are returned from a function or to ensure that the function doesn&#8217;t fail.<span id="more-1451"></span></p>
<p>Following on from a previous post on a simple function to calculate the volume of a cylinder we can include a test with the <strong>is.numeric</strong> function. The usage of this function is best shown with a couple of examples:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; is.numeric(6)
[1] TRUE
&gt; is.numeric(&quot;test&quot;)
[1] FALSE</pre></div></div>

<p>The function returns either <strong>TRUE</strong> or <strong>FALSE</strong> depending on whether the value is numeric. If a vector is specified to this function then a vector or <strong>TRUE</strong> and <strong>FALSE</strong> elements is returned.</p>
<p>We can add two statements to our volume calculation function to test that the height and radius specified by the user are indeed numeric values:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">    if (!is.numeric(height))
        stop(&quot;Height should be numeric.&quot;)
&nbsp;
    if (!is.numeric(radius))
        stop(&quot;Radius should be numeric.&quot;)</pre></div></div>

<p>We add these tests after checking whether the height and radius have been specified and before the test for whether they are positive values. The function now becomes:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">cylinder.volume.5 = function(height, radius)
{
    if (missing(height))
        stop(&quot;Need to specify height of cylinder for calculations.&quot;)
&nbsp;
    if (missing(radius))
        stop(&quot;Need to specify radius of cylinder for calculations.&quot;)
&nbsp;
    if (!is.numeric(height))
        stop(&quot;Height should be numeric.&quot;)
&nbsp;
    if (!is.numeric(radius))
        stop(&quot;Radius should be numeric.&quot;)
&nbsp;
    if (height &lt; 0)
        stop(&quot;Negative height specified.&quot;)
&nbsp;
    if (radius &lt; 0)
        stop(&quot;Negative radius specified.&quot;)
&nbsp;
    volume = pi * radius * radius * height
&nbsp;
    list(Height = height, Radius = radius, Volume = volume)
}</pre></div></div>

<p>A couple of examples show that the function works as expected:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; cylinder.volume.5(20, 4)
$Height
[1] 20
&nbsp;
$Radius
[1] 4
&nbsp;
$Volume
[1] 1005.310
&nbsp;
&gt; cylinder.volume.5(20, &quot;a&quot;)
Error in cylinder.volume.5(20, &quot;a&quot;) : Radius should be numeric.</pre></div></div>

<p>These various validation checks can be combined in different ways to ensure that a user does not try to use a function in a way that was not intended and should lead to greater confidence in the output from the function. This is one approach to checking function arguments and there are likely other slicker ways of doing things.</p>
<p>Other useful resources are provided on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/programming-with-r-checking-data-types/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Programming with R &#8211; Returning Information as a List</title>
		<link>http://www.wekaleamstudios.co.uk/posts/programming-with-r-returning-information-as-a-list/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/programming-with-r-returning-information-as-a-list/#comments</comments>
		<pubDate>Mon, 01 Nov 2010 23:25:16 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[S Programming]]></category>
		<category><![CDATA[function]]></category>
		<category><![CDATA[list]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1441</guid>
		<description><![CDATA[In previous posts (here and here) we created a simple function that returns a single numeric value. In some situations it may be more useful to return a more flexible data type, such as a list object, to provide more information about the calculations that have been performed. Fast Tube by Casper We can extend [...]]]></description>
			<content:encoded><![CDATA[<p>In previous posts (<a href="http://www.wekaleamstudios.co.uk/posts/programming-with-r-function-basics/">here</a> and <a href="http://www.wekaleamstudios.co.uk/posts/programming-with-r-checking-function-arguments/">here</a>) we created a simple function that returns a single numeric value. In some situations it may be more useful to return a more flexible data type, such as a list object, to provide more information about the calculations that have been performed.<span id="more-1441"></span></p>
<p><!--[Fast Tube]--><span id="dCso80qpSSI" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/programming-with-r-returning-information-as-a-list/#dCso80qpSSI"><img src="http://i.ytimg.com/vi/dCso80qpSSI/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>We can extend our previous function by changing the return value to a list including the height and width supplied by the user. The last line of the function is changed to:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">list(Height = height, Radius = radius, Volume = volume)</pre></div></div>

<p>This creates a list with three elements, which are given very obvious names. The function in full is:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">cylinder.volume.4 = function(height, radius)
{
    if (missing(height))
        stop(&quot;Need to specify height of cylinder for calculations.&quot;)
&nbsp;
    if (missing(radius))
        stop(&quot;Need to specify radius of cylinder for calculations.&quot;)
&nbsp;
    if (height &lt; 0)
        stop(&quot;Negative height specified.&quot;)
&nbsp;
    if (radius &lt; 0)
        stop(&quot;Negative radius specified.&quot;)
&nbsp;
    volume = pi * radius * radius * height
&nbsp;
    list(Height = height, Radius = radius, Volume = volume)
}</pre></div></div>

<p>We can call this function using a simple example:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; cylinder.volume.4(20, 4)
$Height
[1] 20
&nbsp;
$Radius
[1] 4
&nbsp;
$Volume
[1] 1005.310</pre></div></div>

<p>The output from this function is a list with three slots as discussed above.</p>
<p>This approach is ideally suitable to statistical applications where we might have a model with a large amount of supplementary information that should be returned after it has been applied to a set of data.</p>
<p>Other useful resources are provided on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/programming-with-r-returning-information-as-a-list/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Programming with R &#8211; Checking Function Arguments</title>
		<link>http://www.wekaleamstudios.co.uk/posts/programming-with-r-checking-function-arguments/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/programming-with-r-checking-function-arguments/#comments</comments>
		<pubDate>Mon, 25 Oct 2010 04:07:56 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[S Programming]]></category>
		<category><![CDATA[argument]]></category>
		<category><![CDATA[cylinder]]></category>
		<category><![CDATA[function]]></category>
		<category><![CDATA[height]]></category>
		<category><![CDATA[if]]></category>
		<category><![CDATA[missing]]></category>
		<category><![CDATA[radius]]></category>
		<category><![CDATA[stop]]></category>
		<category><![CDATA[volume]]></category>
		<category><![CDATA[warning]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1431</guid>
		<description><![CDATA[In a previous post we considered writing a simple function to calculate the volume of a cylinder by specifying the height and radius of the cylinder. The function did not have any checking of the validity of the function arguments which we will consider in this post. Fast Tube by Casper R has various functions [...]]]></description>
			<content:encoded><![CDATA[<p>In a previous <a href="http://www.wekaleamstudios.co.uk/posts/programming-with-r-function-basics/">post</a> we considered writing a simple function to calculate the volume of a cylinder by specifying the height and radius of the cylinder. The function did not have any checking of the validity of the function arguments which we will consider in this post.<span id="more-1431"></span></p>
<p><!--[Fast Tube]--><span id="FT0uk6iqN_o" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/programming-with-r-checking-function-arguments/#FT0uk6iqN_o"><img src="http://i.ytimg.com/vi/FT0uk6iqN_o/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p><strong>R</strong> has various functions that we can use to test certain conditions in our function. These include the functions <strong>stop</strong>, <strong>warning</strong> and conditional statements such as <strong>if</strong> statements combined with <strong>stop</strong> or <strong>warning</strong>.</p>
<p>As an example consider extending the function to calculate volumes to test whether either the height or radius has not been submitted when the function is called. We will make use of the <strong>missing</strong> function that tests whether a specific argument has been provided with the function call. The new function is:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">cylinder.volume.2 = function(height, radius)
{
    if (missing(height))
        stop(&quot;Need to specify height of cylinder for calculations.&quot;)
&nbsp;
    if (missing(radius))
        stop(&quot;Need to specify radius of cylinder for calculations.&quot;)
&nbsp;
    volume = pi * radius * radius * height
&nbsp;
    volume
}</pre></div></div>

<p>We use the <strong>if</strong> statement to test whether each of the arguments is <strong>missing</strong> and when they are the function is stopped and an error message is written to the console. Here are a couple of examples of the function being halted when insufficient information is provided:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; cylinder.volume.2(height = 7)
Error in cylinder.volume.2(height = 7) : 
  Need to specify radius of cylinder for calculations.
&gt; 
&gt; cylinder.volume.2(radius = 10)
Error in cylinder.volume.2(radius = 10) : 
  Need to specify height of cylinder for calculations.</pre></div></div>

<p>So this handles one particular type of problem with the function but there are other checks that we might want to make. For example, negative values for the height or radius are not sensible and should also lead to an error. We can check this condition using an <strong>if</strong> statement in the function:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">cylinder.volume.3 = function(height, radius)
{
    if (missing(height))
        stop(&quot;Need to specify height of cylinder for calculations.&quot;)
&nbsp;
    if (missing(radius))
        stop(&quot;Need to specify radius of cylinder for calculations.&quot;)
&nbsp;
    if (height &lt; 0)
        stop(&quot;Negative height specified.&quot;)
&nbsp;
    if (radius &lt; 0)
        stop(&quot;Negative radius specified.&quot;)
&nbsp;
    volume = pi * radius * radius * height
&nbsp;
    volume
}</pre></div></div>

<p>An example of the function in action:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; cylinder.volume.3(10, -4)
Error in cylinder.volume.3(10, -4) : Negative radius specified.</pre></div></div>

<p>These are a couple of basic examples of validation that we can include in our function that will hopefully allow us to catch erratic behaviour in software which is more of an issue as programs get larger and more complicated.</p>
<p>Other useful resources are provided on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/programming-with-r-checking-function-arguments/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Programming with R &#8211; Function Basics</title>
		<link>http://www.wekaleamstudios.co.uk/posts/programming-with-r-function-basics/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/programming-with-r-function-basics/#comments</comments>
		<pubDate>Wed, 20 Oct 2010 10:46:52 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[S Programming]]></category>
		<category><![CDATA[argument]]></category>
		<category><![CDATA[cylinder]]></category>
		<category><![CDATA[function]]></category>
		<category><![CDATA[height]]></category>
		<category><![CDATA[radius]]></category>
		<category><![CDATA[return value]]></category>
		<category><![CDATA[volume]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1421</guid>
		<description><![CDATA[One of the benefits of using R for statistical analysis is the programming language which allows users to define their own functions, which is particularly useful for analysis that needs to be repeated. For example, a monthly output from a database may be provided in a pre-determined format and we might be interested in running [...]]]></description>
			<content:encoded><![CDATA[<p>One of the benefits of using <strong>R</strong> for statistical analysis is the programming language which allows users to define their own functions, which is particularly useful for analysis that needs to be repeated. For example, a monthly output from a database may be provided in a pre-determined format and we might be interested in running the same initial analysis on the data.<span id="more-1421"></span></p>
<p><!--[Fast Tube]--><span id="pxBwg3epxy8" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/programming-with-r-function-basics/#pxBwg3epxy8"><img src="http://i.ytimg.com/vi/pxBwg3epxy8/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The function keyword is used to define a function and there is an optional list of function arguments that can be specified. Unlike some programming languages <strong>R</strong> provides a certain degree of flexibility with setting defaults for particular arguments and the way that the arguments are matched can sometimes cause unexpected behaviour. As such it is sensible to explicitly match a value to a particular argument, e.g. <strong>data = mydata</strong>, so that the matching is done as expected.</p>
<p>Consider a simple example of a function that we could write to calculate the volume of a cylinder. The cylinder itself has a radius and height, which will be the two arguments to our function. The basic definition of our function is as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">cylinder.volume = function(height, radius)
{
&nbsp;
}</pre></div></div>

<p>The volume of a cylinder is <strong>pi * raidus * radius * height</strong> which we add to our function and save as an object that is returned at the end of the function calculations. (Edited based on comments &#8211; thanks for pointing out my blunder!). The last line of code in a function, by default, is assumed to be the return value.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">cylinder.volume = function(height, radius)
{
    volume = pi * radius * radius * height
&nbsp;
    volume
}</pre></div></div>

<p>This is a very simple example of function and if we call the function with a radius of 5 units and height of 10 units then the answer that is returned is:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; cylinder.volume(10, 5)
[1] 785.3982</pre></div></div>

<p>There are a number of things that we can do to the function to improve it. For example, how should the function react if the user does not specify a height and/or radius? Also what happens if a negative value is submitted to either argument?</p>
<p>Other useful resources are provided on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/programming-with-r-function-basics/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Measuring the length of time to run a function</title>
		<link>http://www.wekaleamstudios.co.uk/posts/measuring-the-length-of-time-to-run-a-function/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/measuring-the-length-of-time-to-run-a-function/#comments</comments>
		<pubDate>Tue, 16 Mar 2010 22:50:43 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[S Programming]]></category>
		<category><![CDATA[Binomial]]></category>
		<category><![CDATA[glm]]></category>
		<category><![CDATA[Logistic Regression]]></category>
		<category><![CDATA[system.time]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=832</guid>
		<description><![CDATA[When writing R code it is useful to be able to assess the amount of time that a particular function takes to run. We might be interested in measuring the increase in time required by our function as the size of the data increases. To illustrate using the system.time function to calculate the time taken [...]]]></description>
			<content:encoded><![CDATA[<p>When writing <strong>R</strong> code it is useful to be able to assess the amount of time that a particular function takes to run. We might be interested in measuring the increase in time required by our function as the size of the data increases.<span id="more-832"></span></p>
<p>To illustrate using the <strong>system.time</strong> function to calculate the time taken to run an expression consider a set of football results where we are using a logistic regression model to determine the factors that change the probability of a home win. If we fit a logistic regression model using the <strong>glm</strong> function to our data set with variables for the home and away team we can embed the function call inside the <strong>system.time</strong> function.</p>
<p>If the data is stored in the data frame called <strong>results.df</strong> the function call to fit the logistic regression model would be something of this form:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">glm(HomeWin ~ Home + Away, data = results.df)</pre></div></div>

<p>The function call would be:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; system.time(glm(HomeWin ~ Home + Away, data = results.df))
   user  system elapsed 
   1.62    0.08    1.72</pre></div></div>

<p>The output is measured in seconds and is based on a set of data with 1,000 match results. We could extend the data set to 2,000 match results to see how the time to fit the model increases. If the new data set is stored in the data frame <strong>results.df2</strong> then the function call would be:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; system.time(glm(HomeWin ~ Home + Away, data = results.df2))
   user  system elapsed 
   4.37    0.14    4.55</pre></div></div>

<p>The time to run the function is increase by a factor of 2.7 (approx.) based on these two runs. This use of <strong>system.time</strong> provides some elementary information about the time taken for the expression to be evaluated.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/measuring-the-length-of-time-to-run-a-function/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

