<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Software for Exploratory Data Analysis and Statistical Modelling &#187; Trellis Graphics</title>
	<atom:link href="http://www.wekaleamstudios.co.uk/topics/statistical-analysis/trellis-graphics/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.wekaleamstudios.co.uk</link>
	<description>Statistical Modelling with R</description>
	<lastBuildDate>Wed, 01 Feb 2012 19:44:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Creating surface plots</title>
		<link>http://www.wekaleamstudios.co.uk/posts/creating-surface-plots/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/creating-surface-plots/#comments</comments>
		<pubDate>Fri, 28 May 2010 15:07:29 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Base Graphics]]></category>
		<category><![CDATA[Lattice Graphics]]></category>
		<category><![CDATA[Trellis Graphics]]></category>
		<category><![CDATA[box]]></category>
		<category><![CDATA[expand.grid]]></category>
		<category><![CDATA[lattice]]></category>
		<category><![CDATA[loess]]></category>
		<category><![CDATA[persp]]></category>
		<category><![CDATA[predict]]></category>
		<category><![CDATA[surface]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[wireframe]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1132</guid>
		<description><![CDATA[A 3d wireframe plot is a type of graph that is used to display a surface &#8211; geographic data is an example of where this type of graph would be used or it could be used to display a fitted model with more than one explanatory variable. These plots are related to contour plots which [...]]]></description>
			<content:encoded><![CDATA[<p>A 3d wireframe plot is a type of graph that is used to display a surface &#8211; geographic data is an example of where this type of graph would be used or it could be used to display a fitted model with more than one explanatory variable. These plots are related to contour plots which are the two dimensional equivalent.<span id="more-1132"></span></p>
<p>To illustrate this type of graph we will consider some surface elevation data that is available in the <strong>geoR</strong> package and was used in the blog <a href="http://www.wekaleamstudios.co.uk/posts/displaying-data-using-level-plots/">post</a> on level plots. The data set in this package is called <strong>elevation</strong> and stores the elevation height in feet (as multiples of ten feet) for a grid region of x and y coordinates (recorded as multiples of 50 feet). This post has details of the various operations that are undertaken to prepare the data for graphing.</p>
<p><strong>Base Graphics</strong></p>
<p><!--[Fast Tube]--><span id="sEsDeE-CsHg" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/creating-surface-plots/#sEsDeE-CsHg"><img src="http://i.ytimg.com/vi/sEsDeE-CsHg/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The function <strong>persp</strong> is the <strong>base</strong> graphics function for creating wireframe surface plots. The <strong>persp</strong> function requires a list of x and y values covering the grid of vertical values which is specified as the <strong>z</strong> variable. The heights for the display are specified as a table of values which we saved previously as the object <strong>z</strong> during the calculations when the local trend surface model was fitted to the data. The text on the axis labels are specified by the <strong>xlab</strong> and <strong>ylab</strong> function arguments and the <strong>main</strong> argument determines the overall title for the graph.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">persp(seq(10, 300, 5), seq(10, 300, 5), z, phi = 45, theta = 45,
  xlab = &quot;X Coordinate (feet)&quot;, ylab = &quot;Y Coordinate (feet)&quot;,
  main = &quot;Surface elevation data&quot;
)</pre></div></div>

<p>The function arguments <strong>phi</strong> and <strong>theta</strong> are used to rotate the viewing angle of the surface. Trial and error is probably the way to go when setting these as good choices depend entirely on the shape of the surface being displayed.</p>
<div id="attachment_1138" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/05/surface-base.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/05/surface-base-300x300.jpg" alt="Base Graphics Surface Plot" title="Surface Plot Example" width="300" height="300" class="size-medium wp-image-1138" /></a><p class="wp-caption-text">Base Graphics Surface Plot</p></div>
<p>The surface is clear and easy to determine the shape and variation in height across the <strong>x</strong> and <strong>y</strong> grid coordinates.</p>
<p><strong>Lattice Graphics</strong></p>
<p><!--[Fast Tube]--><span id="9mzSsIgKCZg" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/creating-surface-plots/#9mzSsIgKCZg"><img src="http://i.ytimg.com/vi/9mzSsIgKCZg/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The <strong>lattice</strong> graphics package has a function <strong>wireframe</strong> and we use the data in the object <strong>elevation.fit</strong> to create the graph. We use the formula interface to specify first the z axis data (the heights) followed by the two variables specifying the <strong>x</strong> and <strong>y</strong> axis coordinates for the data.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">wireframe(Height ~ x*y, data = elevation.fit,
  xlab = &quot;X Coordinate (feet)&quot;, ylab = &quot;Y Coordinate (feet)&quot;,
  main = &quot;Surface elevation data&quot;,
  drape = TRUE,
  colorkey = TRUE,
  screen = list(z = -60, x = -60)
)</pre></div></div>

<p>The axes labels and title are specified in the same way as the <strong>base</strong> graphics with the <strong>xlab</strong>, <strong>ylab</strong> and <strong>main</strong> function arguments. A colour key is added using the <strong>colorkey</strong> function argument and setting it to <strong>TRUE</strong>.</p>
<div id="attachment_1139" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/05/surface-lattice.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/05/surface-lattice-300x300.jpg" alt="Lattice Graphics Surface Plot" title="Surface Plot Example" width="300" height="300" class="size-medium wp-image-1139" /></a><p class="wp-caption-text">Lattice Graphics Surface Plot</p></div>
<p>The surface produced by the <strong>wireframe</strong> function is similar to the <strong>persp</strong> function with the main difference between the colours used on the surface.</p>
<p>This blog post is summarised in a pdf leaflet on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/creating-surface-plots/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Displaying data using level plots</title>
		<link>http://www.wekaleamstudios.co.uk/posts/displaying-data-using-level-plots/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/displaying-data-using-level-plots/#comments</comments>
		<pubDate>Mon, 03 May 2010 10:17:08 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Base Graphics]]></category>
		<category><![CDATA[Exploratory Data Analysis]]></category>
		<category><![CDATA[Grammar of Graphics]]></category>
		<category><![CDATA[Lattice Graphics]]></category>
		<category><![CDATA[Trellis Graphics]]></category>
		<category><![CDATA[box]]></category>
		<category><![CDATA[expand.grid]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[image]]></category>
		<category><![CDATA[lattice]]></category>
		<category><![CDATA[levelplot]]></category>
		<category><![CDATA[loess]]></category>
		<category><![CDATA[predict]]></category>
		<category><![CDATA[surface]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1008</guid>
		<description><![CDATA[A level plot is a type of graph that is used to display a surface in two rather than three dimensions &#8211; the surface is viewed from above as if we were looking straight down and is an alternative to a contour plot &#8211; geographic data is an example of where this type of graph [...]]]></description>
			<content:encoded><![CDATA[<p>A level plot is a type of graph that is used to display a surface in two rather than three dimensions &#8211; the surface is viewed from above as if we were looking straight down and is an alternative to a contour plot &#8211; geographic data is an example of where this type of graph would be used. A contour plot uses lines to identify regions of different heights and the level plot uses coloured regions to produce a similar effect.<span id="more-1008"></span></p>
<p>To illustrate this type of graph we will consider some surface elevation data that is available in the <strong>geoR</strong> package. The data set in this package is called <strong>elevation</strong> and stores the elevation height in feet (as multiples of ten feet) for a grid region of x and y coordinates (recorded as multiples of 50 feet). To access this data we load the <strong>geoR</strong> pacakage and then use the <strong>data</strong> function:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">require(geoR)
data(elevation)</pre></div></div>

<p>For some packages we need the call to the <strong>data</strong> function to make a set of data available for our use. The <strong>elevation</strong> object is not a data frame so our first step is to create our own data frame to be used to create the level plots using the different graphics packages.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">elevation.df = data.frame(x = 50 * elevation$coords[,&quot;x&quot;],
  y = 50 * elevation$coords[,&quot;y&quot;], z = 10 * elevation$data)</pre></div></div>

<p>We extract the x and y grid coordinates and the height values, multiplying them by 50 and 10 respectively to convert to feet for the graphs. Rather than trying to plot the individual values we need to create a surface to cover the whole grid region as the points themselves are too sparse. We make use of the <strong>loess</strong> function to fit a local polynomial trend surface (using weighted least squares) to approximate the elevation across the whole region. The function call for a local quadratic surface is shown below:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">elevation.loess = loess(z ~ x*y, data = elevation.df,
  degree = 2, span = 0.25)</pre></div></div>

<p>The next stage is to extract heights from this fitted surface at regular intervals across the whole grid region of interest &#8211; which runs from 10 to 300 feet in both the x and y directions. The <strong>expand.grid</strong> function creates an array of all combinations of the x and y values that we specify in a list. We choose a range every foot from 10 to 300 feet to create a fine grid:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">elevation.fit = expand.grid(list(x = seq(10, 300, 1), y = seq(10, 300, 1)))</pre></div></div>

<p>The <strong>predict</strong> function is then used to estimate the surface height at all of these combinations of x and y coordinates covering our grid region. This is saved as an object <strong>z</strong> which will be used by the <strong>base</strong> graphics function:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">z = predict(elevation.loess, newdata = elevation.fit)</pre></div></div>

<p>The <strong>lattice</strong> and <strong>ggplot2</strong> expect the data in a different format so we make use of the <strong>as.numeric</strong> function to convert from a table of heights to a single column and append to the object we create based on all combinations of x and y coordinates:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">elevation.fit$Height = as.numeric(z)</pre></div></div>

<p>The data is now in a format that can be used to create the level plots in the various packages.</p>
<p><strong>Base Graphics</strong></p>
<p>The function <strong>image</strong> in the <strong>base</strong> graphics package is the function we use to create a level plot. This function requires a list of x and y values that cover the grid of vertical values that will be used to create the surface. These heights are specified as a table of values, which in our case was saved as the object <strong>z</strong> during the calculations on the local trend surface.</p>
<p>The text on the axis labels are specified by the <strong>xlab</strong> and <strong>ylab</strong> function arguments and the <strong>main</strong> argument determines the overall title for the graph. The function call below creates the level plot:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">image(seq(10, 300, 1), seq(10, 300, 1), z,
  xlab = &quot;X Coordinate (feet)&quot;, ylab = &quot;Y Coordinate (feet)&quot;,
  main = &quot;Surface elevation data&quot;)
box()</pre></div></div>

<p>After the <strong>image</strong> function is used we call the <strong>box</strong> function mainly for aesthetic purposes to ensure there is a line surrounding the level plot. The graph that is created is shown below:</p>
<div id="attachment_1012" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/05/levelplot-base.jpg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/05/levelplot-base-300x300.jpg" alt="Base Graphics Level Plot" title="Level plot Example" width="300" height="300" class="size-medium wp-image-1012" /></a><p class="wp-caption-text">Base Graphics Level Plot</p></div>
<p>The default colour scheme used by the <strong>base</strong> graphics produces an attractive level plot graph where we can easily see the variation in height across the grid region. It is basically a fancy version of a contour plot where the regions between the contour lines are coloured with different shades indicating the height in those regions.</p>
<p><strong>Lattice Graphics</strong></p>
<p>The <strong>lattice</strong> graphics package provides a function <strong>levelplot</strong> for this type of graphical dispaly. We use the data stored in the object <strong>elevation.fit</strong> to create the graph with <strong>lattice</strong> graphics.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">levelplot(Height ~ x*y, data = elevation.fit,
  xlab = &quot;X Coordinate (feet)&quot;, ylab = &quot;Y Coordinate (feet)&quot;,
  main = &quot;Surface elevation data&quot;,
  col.regions = terrain.colors(100)
)</pre></div></div>

<p>The formula is used to specify which variable to use for the three axes and a data frame where the values are stored &#8211; as there are three dimensions it is the z axis that is specified on the left hand side of the formula. The axes labels and title are specified in the same way as the <strong>base</strong> graphics.</p>
<p>The range of colours used in the <strong>lattice</strong> level plot can be specified as a vector of colours to the <strong>col.regions</strong> argument of the function. We make use of the <strong>terrian.colors</strong> function to create this vector which a range of 100 colours which are less striking than those used above with the <strong>base</strong> graphics. The level plot that we can is shown here:</p>
<div id="attachment_1014" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/05/levelplot-lattice.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/05/levelplot-lattice-300x300.jpg" alt="Lattice Graphics Level Plot" title="Level plot Example" width="300" height="300" class="size-medium wp-image-1014" /></a><p class="wp-caption-text">Lattice Graphics Level Plot</p></div>
<p>This is in general similar to the <strong>base</strong> graphics display but the actual plot region is a different shape that makes things look slightly different.</p>
<p><strong>ggplot2</strong></p>
<p>The <strong>ggplot2</strong> package also provides facilities for creating a level plot making use of the tile geom to create the desired graph. The function <strong>ggplot</strong> forms the basis of the graph and various other options are used to customise the graph:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(elevation.fit, aes(x, y, fill = Height)) + geom_tile() +
  xlab(&quot;X Coordinate (feet)&quot;) + ylab(&quot;Y Coordinate (feet)&quot;) +
  opts(title = &quot;Surface elevation data&quot;) +
  scale_fill_gradient(limits = c(7000, 10000),low = &quot;black&quot;,high = &quot;white&quot;) +
  scale_x_continuous(expand = c(0,0)) +
  scale_y_continuous(expand = c(0,0))</pre></div></div>

<p>This large number of options that are added to the graph change various settings. The choice of colours for the heights used on graph is selected by the <strong>scale_fill_gradient</strong> function with colours ranging from black to white. The <strong>scale_x_continuous</strong> and <strong>scale_y_continuous</strong> options are used to stretch the tiles to cover the whole grid region covering up the default gray background &#8211; this makes the graph more visually appealing. The graph that is produced is shown here:</p>
<div id="attachment_1013" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/05/levelplot-ggplot2.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/05/levelplot-ggplot2-300x300.jpg" alt="ggplot2 Level Plot" title="Level plot Example" width="300" height="300" class="size-medium wp-image-1013" /></a><p class="wp-caption-text">ggplot2 Level Plot</p></div>
<p>The graph from <strong>ggplot2</strong> is visually as impressive as the other graphs &#8211; there is more smoothing between the colours which blurs some of the lines on the other graphs because of the type of colour gradient that was selected.</p>
<p>This blog post is summarised in a pdf leaflet on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/displaying-data-using-level-plots/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Analysis of Covariance &#8211; Extending Simple Linear Regression</title>
		<link>http://www.wekaleamstudios.co.uk/posts/analysis-of-covariance-extending-simple-linear-regression/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/analysis-of-covariance-extending-simple-linear-regression/#comments</comments>
		<pubDate>Wed, 28 Apr 2010 19:25:58 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Analysis of Variance]]></category>
		<category><![CDATA[Lattice Graphics]]></category>
		<category><![CDATA[Statistical Modelling]]></category>
		<category><![CDATA[Trellis Graphics]]></category>
		<category><![CDATA[analysis of variance]]></category>
		<category><![CDATA[ANOVA]]></category>
		<category><![CDATA[covariate]]></category>
		<category><![CDATA[fitted]]></category>
		<category><![CDATA[lattice]]></category>
		<category><![CDATA[lm]]></category>
		<category><![CDATA[panel]]></category>
		<category><![CDATA[panel.lmline]]></category>
		<category><![CDATA[panel.xyplot]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[resid]]></category>
		<category><![CDATA[residual]]></category>
		<category><![CDATA[summary]]></category>
		<category><![CDATA[trellis]]></category>
		<category><![CDATA[xyplot]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=989</guid>
		<description><![CDATA[The simple linear regression model considers the relationship between two variables and in many cases more information will be available that can be used to extend the model. For example, there might be a categorical variable (sometimes known as a covariate) that can be used to divide the data set to fit a separate linear [...]]]></description>
			<content:encoded><![CDATA[<p>The simple linear regression model considers the relationship between two variables and in many cases more information will be available that can be used to extend the model. For example, there might be a categorical variable (sometimes known as a covariate) that can be used to divide the data set to fit a separate linear regression to each of the subsets. We will consider how to handle this extension using one of the data sets available within the <strong>R</strong> software package.<span id="more-989"></span></p>
<p>There is a set of data relating trunk circumference (in mm) to the age of Orange trees where data was recorded for five trees. This data is available in the data frame <strong>Orange</strong> and we make a copy of this data set so that we can remove the ordering that is recorded for the <strong>Tree</strong> identifier variable. We create a new factor after converting the old factor to a numeric string:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">orange.df = Orange
orange.df$Tree = factor(as.numeric(orange.df$Tree))</pre></div></div>

<p>The purpose of this step is to set up the variable for use in the linear model. The simplest model assumes that the relationship between circumference and age is the same for all five trees and we fit this model as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">orange.mod1 = lm(circumference ~ age, data = orange.df)</pre></div></div>

<p>The summary of the fitted model is shown here:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; summary(orange.mod1)
&nbsp;
Call:
lm(formula = circumference ~ age, data = orange.df)
&nbsp;
Residuals:
      Min        1Q    Median        3Q       Max 
-46.31030 -14.94610  -0.07649  19.69727  45.11146 
&nbsp;
Coefficients:
             Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept) 17.399650   8.622660   2.018   0.0518 .  
age          0.106770   0.008277  12.900 1.93e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
&nbsp;
Residual standard error: 23.74 on 33 degrees of freedom
Multiple R-squared: 0.8345,     Adjusted R-squared: 0.8295 
F-statistic: 166.4 on 1 and 33 DF,  p-value: 1.931e-14</pre></div></div>

<p>The test on the <strong>age</strong> parameter provides very strong evidence of an increase in circumference with age, as would be expected. The next stage is to consider how this model can be extended &#8211; one idea is to have a separate intercept for each of the five trees. This new model assumes that the increase in circumference is consistent between the trees but that the growth starts at different rates. We fit this model and get the summary as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; orange.mod2 = lm(circumference ~ age + Tree, data = orange.df)
&gt; summary(orange.mod2)
&nbsp;
Call:
lm(formula = circumference ~ age + Tree, data = orange.df)
&nbsp;
Residuals:
    Min      1Q  Median      3Q     Max 
-30.505  -8.790   3.738   7.650  21.859 
&nbsp;
Coefficients:
             Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept) -4.457493   7.572732  -0.589   0.5607    
age          0.106770   0.005321  20.066  &lt; 2e-16 ***
Tree2        5.571429   8.157252   0.683   0.5000    
Tree3       17.142857   8.157252   2.102   0.0444 *  
Tree4       41.285714   8.157252   5.061 2.14e-05 ***
Tree5       45.285714   8.157252   5.552 5.48e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
&nbsp;
Residual standard error: 15.26 on 29 degrees of freedom
Multiple R-squared: 0.9399,     Adjusted R-squared: 0.9295 
F-statistic:  90.7 on 5 and 29 DF,  p-value: &lt; 2.2e-16</pre></div></div>

<p>The additional term is appended to the simple model using the <strong>+</strong> in the formula part of the call to <strong>lm</strong>. The first tree is used as the baseline to compare the other four trees against and the model summary shows that tree 2 is similar to tree 1 (no real need for a different offset) but that there is evidence that the offset for the other three trees is significantly larger than tree 1 (and tree 2). We can compare the two models using an F-test for nested models using the <strong>anova</strong> function:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; anova(orange.mod1, orange.mod2)
Analysis of Variance Table
&nbsp;
Model 1: circumference ~ age
Model 2: circumference ~ age + Tree
  Res.Df     RSS Df Sum of Sq      F    Pr(&gt;F)    
1     33 18594.7                                  
2     29  6753.9  4     11841 12.711 4.289e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1</pre></div></div>

<p>Here there are four degrees of freedom used up by the more complicated model (four parameters for the different trees) and the test comparing the two models is highly significant. There is very strong evidence of a difference in starting circumference (for the data that was collected) between the trees.</p>
<p>We can extended this model further by allowing the rate of increase in circumference to vary between the five trees. This additional term can be included in the linear model as an interaction term, assuming that tree 1 is the baseline. An interaction term is included in the model formula with a <strong>:</strong> between the name of two variables. For the Orange tree data the new model is fitted thus:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; orange.mod3 = lm(circumference ~ age + Tree + age:Tree, data = orange.df)
&gt; summary(orange.mod3)
&nbsp;
Call:
lm(formula = circumference ~ age + Tree + age:Tree, data = orange.df)
&nbsp;
Residuals:
    Min      1Q  Median      3Q     Max 
-18.061  -6.639  -1.482   8.069  16.649 
&nbsp;
Coefficients:
              Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept)  1.920e+01  8.458e+00   2.270  0.03206 *  
age          8.111e-02  8.119e-03   9.991 3.27e-10 ***
Tree2        5.234e+00  1.196e+01   0.438  0.66544    
Tree3       -1.045e+01  1.196e+01  -0.873  0.39086    
Tree4        7.574e-01  1.196e+01   0.063  0.95002    
Tree5       -4.566e+00  1.196e+01  -0.382  0.70590    
age:Tree2    3.656e-04  1.148e-02   0.032  0.97485    
age:Tree3    2.992e-02  1.148e-02   2.606  0.01523 *  
age:Tree4    4.395e-02  1.148e-02   3.828  0.00077 ***
age:Tree5    5.406e-02  1.148e-02   4.708 7.93e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
&nbsp;
Residual standard error: 10.41 on 25 degrees of freedom
Multiple R-squared: 0.9759,     Adjusted R-squared: 0.9672 
F-statistic: 112.4 on 9 and 25 DF,  p-value: &lt; 2.2e-16</pre></div></div>

<p>Interesting we see that there is strong evidence of a difference in the rate of change in circumference for the five trees. The previously observed difference in intercepts is now longer as strong but this parameter is kept in the model &#8211; there are plenty of books/websites that discuss this marginality restrictin on statistical models. The fitted model described above can be created using <strong>lattice</strong> graphics with a custom panel function making use of available panel functions for fitting and drawing a linear regression line for each panel of a Trellis display. The function call is shown below:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">xyplot(circumference ~ age | Tree, data = orange.df,
  panel = function(x, y, ...)
  {
    panel.xyplot(x, y, ...)
    panel.lmline(x, y, ...)
  }
)</pre></div></div>

<p>The <strong>panel.xyplot</strong> and <strong>panel.lmline</strong> functions are part of the lattice package along with many other panel functions and can be built up to create a display that differs from the standard. The graph that is produced:</p>
<div id="attachment_992" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/orange-fittedmodel.png"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/orange-fittedmodel-300x300.png" alt="Orange Tree Fitted Model" title="Orange Tree Fitted Model" width="300" height="300" class="size-medium wp-image-992" /></a><p class="wp-caption-text">Analysis of Covariance Model fitted to the Orange Tree data</p></div>
<p>This graph clearly shows the different relationships between circumference and age for the five trees. The residuals from the model can be plotted against fitted values, divided by tree, to investigate the model assumptions:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">xyplot(resid(orange.mod3) ~ fitted(orange.mod3) | orange.df$Tree,
  xlab = &quot;Fitted Values&quot;,
  ylab = &quot;Residuals&quot;,
  main = &quot;Residual Diagnostic Plot&quot;,
  panel = function(x, y, ...)
  {
    panel.grid(h = -1, v = -1)
    panel.abline(h = 0)
    panel.xyplot(x, y, ...)
  }
)</pre></div></div>

<p>The residual diagnostic plot is:</p>
<div id="attachment_994" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/orange-residualplot.png"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/orange-residualplot-300x300.png" alt="Orange Tree Model Residual Plot" title="Orange Tree Model Residual Plot" width="300" height="300" class="size-medium wp-image-994" /></a><p class="wp-caption-text">Residual diagnostic plot for the analysis of covariance model fitted to the Orange Tree data</p></div>
<p>There are no obvious problematic patterns in this graph so we conclude that this model is a reasonable representation of the relationship between circumference and age.</p>
<p>Additional: The analysis of variance table comparing the second and third models shows an improvement by moving to the more complicated model with different slopes:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; anova(orange.mod2, orange.mod3)
Analysis of Variance Table
&nbsp;
Model 1: circumference ~ age + Tree
Model 2: circumference ~ age + Tree + age:Tree
  Res.Df    RSS Df Sum of Sq      F    Pr(&gt;F)    
1     29 6753.9                                  
2     25 2711.0  4    4042.9 9.3206 9.402e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1</pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/analysis-of-covariance-extending-simple-linear-regression/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Summarising data using box and whisker plots</title>
		<link>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-box-and-whisker-plots/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-box-and-whisker-plots/#comments</comments>
		<pubDate>Sun, 25 Apr 2010 07:37:10 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Base Graphics]]></category>
		<category><![CDATA[Exploratory Data Analysis]]></category>
		<category><![CDATA[Grammar of Graphics]]></category>
		<category><![CDATA[Lattice Graphics]]></category>
		<category><![CDATA[Trellis Graphics]]></category>
		<category><![CDATA[Box and Whisker]]></category>
		<category><![CDATA[boxplot]]></category>
		<category><![CDATA[bwplot]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[lattice]]></category>
		<category><![CDATA[trellis]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=960</guid>
		<description><![CDATA[A box and whisker plot is a type of graphical display that can be used to summarise a set of data based on the five number summary of this data. The summary statistics used to create a box and whisker plot are the median of the data, the lower and upper quartiles (25% and 75%) [...]]]></description>
			<content:encoded><![CDATA[<p>A box and whisker plot is a type of graphical display that can be used to summarise a set of data based on the five number summary of this data. The summary statistics used to create a box and whisker plot are the median of the data, the lower and upper quartiles (25% and 75%) and the minimum and maximum values.<span id="more-960"></span></p>
<p>The box and whisker plot is an effective way to investigate the distribution of a set of data. For example, skewness can be identified from the box and whisker as the display does not make any assumptions about the underlying distribution of the data. The extreme values at either end of the scale are sometimes included on the display to show how far they extend beyond the majority of the data.</p>
<p>To illustrate creating box and whisker plots we consider UK meteorological data that has been collected on a monthly basis at Southampton, UK between 1950 and 1999 and is publicly available. This data is available from the <a href="http://www.metoffice.gov.uk/">UK Met Office</a> and we will compare the range of temperatures recorded in each month of the year over this period by creating box and whisker plots with the different packages.</p>
<p>The data is assumed to have been imported into <strong>R</strong> and stored in a data frame called <strong>soton.df</strong>. An extract of the data is shown here:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">    Year Month Max.Temp Min.Temp Frost  Rain
1   1950   Jan      7.7      2.8     7  20.1
2   1950   Feb     10.3        4     4 127.0
3   1950   Mar     13.0      4.5     2  39.4
4   1950   Apr     13.6      4.7     0  62.0
5   1950   May     17.9      7.8     0  32.2</pre></div></div>

<p><strong>Base Graphics</strong></p>
<p><!--[Fast Tube]--><span id="Pe-48TAtBho" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-box-and-whisker-plots/#Pe-48TAtBho"><img src="http://i.ytimg.com/vi/Pe-48TAtBho/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The <strong>base</strong> graphics approach makes use of the <strong>boxplot</strong> function to create box and whisker plots. In this situation the function can be used with a formula rather than specifying two separate vectors of data &#8211; we can specify a data frame to point towards a source of data to be used in the graph. For the temperature data we use this code:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">boxplot(Max.Temp ~ Month, data = soton.df,
  xlab = &quot;Month&quot;, ylab = &quot;Maximum Temperature&quot;,
  main = &quot;Temperature at Southampton Weather Station (1950-1999)&quot;
)</pre></div></div>

<p>The horizontal and vertical axes labels are specified using the <strong>xlab</strong> and <strong>ylab</strong> arguments respectively and the title of the plot is created using the <strong>main</strong> argument. The box and whisker plot is shown here:</p>
<div id="attachment_962" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/boxwhisker-base.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/boxwhisker-base-300x300.jpg" alt="Base Graphics Box and Whisker Plot" title="Box and Whisker plot Example" width="300" height="300" class="size-medium wp-image-962" /></a><p class="wp-caption-text">Base Graphics Box and Whisker Plot</p></div>
<p>The function <strong>boxplot</strong> makes it easy to create a reasonably attractive box and whisker plot. The variation in the distribution of temperatures across the year can be seen from the graph.</p>
<p><strong>Lattice Graphics</strong></p>
<p><!--[Fast Tube]--><span id="RJcZ_7EOzv8" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-box-and-whisker-plots/#RJcZ_7EOzv8"><img src="http://i.ytimg.com/vi/RJcZ_7EOzv8/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>In the <strong>lattice</strong> graphics package there is a function <strong>bwplot</strong> which is used to create box and whisker plots. The function call also uses a formula to specify the <strong>x</strong> and <strong>y</strong> variables to use on the graph. The function call arguments are identical to the <strong>boxplot</strong> function in <strong>base</strong> graphics:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">bwplot(Max.Temp ~ Month, data = soton.df,
  xlab = &quot;Month&quot;, ylab = &quot;Maximum Temperature&quot;,
  main = &quot;Temperature at Southampton Weather Station (1950-1999)&quot;
)</pre></div></div>

<p>The variable <strong>Month</strong> is categorical so a separate box and whisker summary is created for each month separately. The <strong>lattice</strong> version of the graph is shown here:</p>
<div id="attachment_963" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/boxwhisker-lattice.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/boxwhisker-lattice-300x300.jpg" alt="Lattice Graphics Box and Whisker Plot" title="Box and Whisker plot Example" width="300" height="300" class="size-medium wp-image-963" /></a><p class="wp-caption-text">Lattice Graphics Box and Whisker Plot</p></div>
<p>This is very similar to the box and whisker plot created by <strong>base</strong> graphics with a similar level of effort required. The main difference is the use of a circle rather than a line to identify the location of the median of the data.</p>
<p><strong>ggplot2</strong></p>
<p><!--[Fast Tube]--><span id="WJQdYId2TUA" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-box-and-whisker-plots/#WJQdYId2TUA"><img src="http://i.ytimg.com/vi/WJQdYId2TUA/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>In the <strong>ggplot2</strong> package there is a general function <strong>ggplot</strong> that is used to create graphs of any type. We make use of the boxplot geom to create a box and whisker plot following the standard approach. The first step is to specify a data frame to use to create the graph and then map the columns of this data frame, via the \texttt{aes} argument, to the different axes or other aesthetics (such as colour or symbol shape). The particular geom is used to specify the type of plot that we want to create. Our final step is to add on the various axes labels and an overall title to the graph.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(soton.df, aes(Month, Max.Temp)) + geom_boxplot() +
  ylab(&quot;Maximum Temperature&quot;) +
  opts(title = &quot;Temperature at Southampton Weather Station (1950-1999)&quot;)</pre></div></div>

<p>The <strong>ggplot2</strong> version of box and whisker plots is shown here:</p>
<div id="attachment_964" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/boxwhisker-ggplot2.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/boxwhisker-ggplot2-300x300.jpg" alt="ggplot2 Graphics Box and Whisker Plot" title="Box and Whisker plot Example" width="300" height="300" class="size-medium wp-image-964" /></a><p class="wp-caption-text">ggplot2 Graphics Box and Whisker Plot</p></div>
<p>The distinctive gray background used by <strong>ggplot2</strong> is an obvious visual difference compared to the default clear background used in the other two approaches. The boxes themselves have a cleaner look in this graph than the other two methods and the overall look is slick.</p>
<p>This blog post is summarised in a pdf leaflet on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-box-and-whisker-plots/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Simple Linear Regression</title>
		<link>http://www.wekaleamstudios.co.uk/posts/simple-linear-regression/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/simple-linear-regression/#comments</comments>
		<pubDate>Fri, 23 Apr 2010 08:51:57 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Lattice Graphics]]></category>
		<category><![CDATA[Linear Models]]></category>
		<category><![CDATA[Statistical Modelling]]></category>
		<category><![CDATA[Trellis Graphics]]></category>
		<category><![CDATA[explanatory variable]]></category>
		<category><![CDATA[fitted]]></category>
		<category><![CDATA[lattice]]></category>
		<category><![CDATA[linear]]></category>
		<category><![CDATA[lm]]></category>
		<category><![CDATA[modelling]]></category>
		<category><![CDATA[one variable]]></category>
		<category><![CDATA[predictor]]></category>
		<category><![CDATA[qqmath]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[resid]]></category>
		<category><![CDATA[residual]]></category>
		<category><![CDATA[response]]></category>
		<category><![CDATA[summary]]></category>
		<category><![CDATA[trellis]]></category>
		<category><![CDATA[xyplot]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=907</guid>
		<description><![CDATA[One of the most frequent used techniques in statistics is linear regression where we investigate the potential relationship between a variable of interest (often called the response variable but there are many other names in use) and a set of one of more variables (known as the independent variables or some other term). Unsurprisingly there [...]]]></description>
			<content:encoded><![CDATA[<p>One of the most frequent used techniques in statistics is linear regression where we investigate the potential relationship between a variable of interest (often called the response variable but there are many other names in use) and a set of one of more variables (known as the independent variables or some other term). Unsurprisingly there are flexible facilities in <strong>R</strong> for fitting a range of linear models from the simple case of a single variable to more complex relationships.<span id="more-907"></span></p>
<p>In this post we will consider the case of simple linear regression with one response variable and a single independent variable. For this example we will use some data from the book Mathematical Statistics with Applications by Mendenhall, Wackerly and Scheaffer (Fourth Edition &#8211; Duxbury 1990). This data is for a study in central Florida where 15 alligators were captured and two measurements were made on each of the alligators. The weight (in pounds) was recorded with the snout vent length (in inches &#8211; this is the distance between the back of the head to the end of the nose).</p>
<p>The purpose of using this data is to determine whether there is a relationship, described by a simple linear regression model, between the weight and snout vent length. The authors analysed the data on the log scale (natural logarithms) and we will follow their approach for consistency. We first create a data frame for this study:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">alligator = data.frame(
  lnLength = c(3.87, 3.61, 4.33, 3.43, 3.81, 3.83, 3.46, 3.76,
    3.50, 3.58, 4.19, 3.78, 3.71, 3.73, 3.78),
  lnWeight = c(4.87, 3.93, 6.46, 3.33, 4.38, 4.70, 3.50, 4.50,
    3.58, 3.64, 5.90, 4.43, 4.38, 4.42, 4.25)
)</pre></div></div>

<p>As with most analysis the first step is to perform some <a href="http://www.wekaleamstudios.co.uk/exploratory-data-analysis/">exploratory data analysis</a> to get a visual impression of whether there is a relationship between weight and snout vent length and what form it is likely to take. We create a <a href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-scatter-plots/">scatter plot</a> of the data as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">xyplot(lnWeight ~ lnLength, data = alligator,
  xlab = &quot;Snout vent length (inches) on log scale&quot;,
  ylab = &quot;Weight (pounds) on log scale&quot;,
  main = &quot;Alligators in Central Florida&quot;
)</pre></div></div>

<p>The scatter plot is shown here:</p>
<div id="attachment_946" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/Alligator-Data.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/Alligator-Data-300x300.jpg" alt="Plot of the weight and snout vent length" title="Alligator Data Plot" width="300" height="300" class="size-medium wp-image-946" /></a><p class="wp-caption-text">Scatter plot of the weight and snout vent length for alligators caught in central Florida</p></div>
<p>The graph suggests that weight (on the log scale) increases linearly with snout vent length (again on the log scale) so we will fit a simple linear regression model to the data and save the fitted model to an object for further analysis:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">alli.mod1 = lm(lnWeight ~ lnLength, data = alligator)</pre></div></div>

<p>The function <strong>lm</strong> fits a linear model to data are we specify the model using a formula where the response variable is on the left hand side separated by a ~ from the explanatory variables. The formula provides a flexible way to specify various different functional forms for the relationship. The <strong>data</strong> argument is used to tell <strong>R</strong> where to look for the variables used in the formula.</p>
<p>Now that the model is saved as an object we can use some of the general purpose functions for extracting information from this object about the linear model, e.g. the parameters or residuals. The big plus with <strong>R</strong> is that there are functions defined for different types of model, using the same name such as summary, and the system works out what function we intended to use based on the type of object saved. To create a summary of the fitted model:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; summary(alli.mod1)
&nbsp;
Call:
lm(formula = lnWeight ~ lnLength, data = alligator)
&nbsp;
Residuals:
     Min       1Q   Median       3Q      Max 
-0.24348 -0.03186  0.03740  0.07727  0.12669 
&nbsp;
Coefficients:
            Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept)  -8.4761     0.5007  -16.93 3.08e-10 ***
lnLength      3.4311     0.1330   25.80 1.49e-12 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
&nbsp;
Residual standard error: 0.1229 on 13 degrees of freedom
Multiple R-squared: 0.9808,     Adjusted R-squared: 0.9794 
F-statistic: 665.8 on 1 and 13 DF,  p-value: 1.495e-12</pre></div></div>

<p>We get a lot of useful information here without being too overwhelmed by pages of output.</p>
<p>The estimates for the model intercept is -8.4761 and the coefficient measuring the <strong>slope</strong> of the relationship with snout vent length is 3.4311 and information about standard errors of these estimates is also provided in the Coefficients table. We see that the test of significance of the model coefficients is also summarised in that table so we can see that there is strong evidence that the coefficient is significantly different to zero &#8211; as the snout vent length increases so does the weight.</p>
<p>Rather than stopping here we perform some investigations using residual diagnostics to determine whether the various assumptions that underpin linear regression are reasonable for our data or if there is evidence to suggest that additional variables are required in the model or some other alterations to identify a better description of the variables that determine how weight changes.</p>
<p>A plot of the residuals against fitted values is used to determine whether there are any systematic patterns, such as over estimation for most of the large values or increasing spread as the model fitted values increase. To create this plot we could use the following code:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">xyplot(resid(alli.mod1) ~ fitted(alli.mod1),
  xlab = &quot;Fitted Values&quot;,
  ylab = &quot;Residuals&quot;,
  main = &quot;Residual Diagnostic Plot&quot;,
  panel = function(x, y, ...)
  {
    panel.grid(h = -1, v = -1)
    panel.abline(h = 0)
    panel.xyplot(x, y, ...)
  }
)</pre></div></div>

<p>We create our own custom panel function using the buliding blocks provided by the <strong>lattice</strong> package. We start by creating a set of grid lines as the base layer and the <strong>h=-1</strong> and <strong>v=-1</strong> tell <strong>lattice</strong> to align these with the labels on the axes. We then create a solid horizontal line to help distinguish between positive and negative residuals. Finally we get the points plotted on the top layer.</p>
<p>The residual diagnostic plot is shown below:</p>
<div id="attachment_951" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/Alligator-ResidualPlot.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/Alligator-ResidualPlot-300x300.jpg" alt="Residual Diagnostic Plot for Linear Model" title="Alligator Residual Plot" width="300" height="300" class="size-medium wp-image-951" /></a><p class="wp-caption-text">Residual Diagnostics Plot for the Linear Regression Model</p></div>
<p>The plot is probably ok but there are more cases of positive residuals and when we consider a normal probability plot we see that there are some deficiencies with the model:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">qqmath( ~ resid(alli.mod1),
  xlab = &quot;Theoretical Quantiles&quot;,
  ylab = &quot;Residuals&quot;
)</pre></div></div>

<p>The function <strong>resid</strong> extracts the model residuals from the fitted model object. The plot is shown here:</p>
<div id="attachment_952" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/Alligator-QQ.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/Alligator-QQ-300x300.jpg" alt="Quantile-Quantile Plot for Linear Model" title="Alligator Quantile-Quantile Plot" width="300" height="300" class="size-medium wp-image-952" /></a><p class="wp-caption-text">Quantile-Quantile Plot for the Linear Regression Model</p></div>
<p>We would hope that this plot showed something approaching a straight line to support the model assumption about the distribution of the residuals. This and the other plots suggest that further tweaking to the model is required to improve the model or a decision would need to be made about whether to report the model as is with some caveats about its usage. I am interested in the thoughts/comments/suggestions from how other people would proceed when faced with this situation &#8211; feel free to add in the comments.</p>
<p><em>Related posts:</em></p>
<ul>
<li>Manual variable selection with the <a href="http://www.wekaleamstudios.co.uk/posts/manual-variable-selection-using-the-dropterm-function/">dropterm</a> function.</li>
<li>The <a href="http://www.wekaleamstudios.co.uk/posts/using-the-update-function-during-variable-selection/">update</a> function for simplifying model selection.</li>
<li>Including factors in a regression model via <a href="http://www.wekaleamstudios.co.uk/posts/simple-linear-regression/">analysis of covariance</a>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/simple-linear-regression/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Summarising data using scatter plots</title>
		<link>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-scatter-plots/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-scatter-plots/#comments</comments>
		<pubDate>Sun, 18 Apr 2010 18:56:06 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Base Graphics]]></category>
		<category><![CDATA[Exploratory Data Analysis]]></category>
		<category><![CDATA[Grammar of Graphics]]></category>
		<category><![CDATA[Lattice Graphics]]></category>
		<category><![CDATA[Trellis Graphics]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[lattice]]></category>
		<category><![CDATA[plot]]></category>
		<category><![CDATA[scatter plot]]></category>
		<category><![CDATA[trellis]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[xyplot]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=912</guid>
		<description><![CDATA[A scatter plot is a graph used to investigate the relationship between two variables in a data set. The x and y axes are used for the values of the two variables and a symbol on the graph represents the combination for each pair of values in the data set. This type of graph is [...]]]></description>
			<content:encoded><![CDATA[<p>A scatter plot is a graph used to investigate the relationship between two variables in a data set. The x and y axes are used for the values of the two variables and a symbol on the graph represents the combination for each pair of values in the data set. This type of graph is used in many common situations and can convey a lot of useful information.<span id="more-912"></span></p>
<p>To illustrate creating a scatter plot we will use a simple data set for the population of the UK between 1992 and 2009. This data is saved in a data frame <strong>uk.df</strong> using the following command:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">uk.df = data.frame(Year = 1992:2009,
  Population = c(57770, 57933, 58096, 58258, 58418, 58577,
  58743, 58925, 59131, 59363, 59618, 59894, 60186, 60489,
  60804, 61129, 61461, 61796)
)</pre></div></div>

<p>For this example the data is recorded in thousands to make the graph easier to read and there is no benefit or noticeable improvement to be seen by using greater detail.</p>
<p><strong>Base Graphics</strong></p>
<p><!--[Fast Tube]--><span id="aqXuiQR4bnY" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-scatter-plots/#aqXuiQR4bnY"><img src="http://i.ytimg.com/vi/aqXuiQR4bnY/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>In the <strong>base</strong> graphics system the general purpose <strong>plot</strong> function can be used to create a scatter plot for the UK population data set that we created. The first two arguments to the <strong>plot</strong> function are the x and y variables respectively. The following code will create a scatter plot, including various labels:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">plot(uk.df$Year, uk.df$Population,
  xlab = &quot;Year&quot;, ylab = &quot;Total Population (Thousands)&quot;,
  main = &quot;UK Population (1992-2009)&quot;, pch = 16)</pre></div></div>

<p>The labels for the x and y axes are specified via the <strong>xlab</strong> and <strong>ylab</strong> arguments to the plot function and the <strong>main</strong> argument specifies the title for the plot.</p>
<div id="attachment_919" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/scatterplot-base.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/scatterplot-base-300x300.jpg" alt="Base Graphics Histogram" title="Scatter plot Example" width="300" height="300" class="size-medium wp-image-919" /></a><p class="wp-caption-text">Base Graphics Histogram</p></div>
<p>The graph itself is plain and functional which solid circles indicating the population (in thousands) for each of the years covered by the data.</p>
<p><strong>Lattice Graphics</strong></p>
<p><!--[Fast Tube]--><span id="NMTCIViCLOU" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-scatter-plots/#NMTCIViCLOU"><img src="http://i.ytimg.com/vi/NMTCIViCLOU/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The <strong>lattice</strong> graphics package provides a function <strong>xyplot</strong> specifically to create scatter plots and the function is used in a similar way to the <strong>base</strong> graphics approach. The first argument to the function is a formula describing the relationship to be plotted on the graph, with the y variable preceding the x variable as we are used to when describing mathematical fomula such as y=a+bx. The data frame is specified with the <strong>data</strong> argument to simplify the expression in the formula. The code used is as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">xyplot(Population ~ Year, data = uk.df,
  xlab = &quot;Year&quot;, ylab = &quot;Total Population (Thousands)&quot;,
  main = &quot;UK Population (1992-2009)&quot;,
  scales = list(x = list(at = seq(1992, 2009, 2)))
)</pre></div></div>

<p>The axis labels and the overall title for the graph are specified in the same way as the <strong>base</strong> graphics system. We indulge in some fine tuning of the labels on the x axis via the <strong>scales</strong> argument &#8211; here we indicate that every second year should be included on the label starting in 1992 and running until 2009. The <strong>lattice</strong> graph is shown here for comparison with the graphs created using the other two packages:</p>
<div id="attachment_921" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/scatterplot-lattice.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/scatterplot-lattice-300x300.jpg" alt="Lattice Graphics Scatter Plot" title="Scatter plot Example" width="300" height="300" class="size-medium wp-image-921" /></a><p class="wp-caption-text">Lattice Graphics Scatter Plot</p></div>
<p>There are very few visual differences between the <strong>lattice</strong> and <strong>base</strong> graphics. In <strong>lattice</strong> graphics an object is created that can be edited to add or remove components and then printed to the screen. This approach is more flexible than the base graphics where the components are painted on top of each other and the use of themes in <strong>lattice</strong> will make it easier to keep a consistent look to all graphs in a document.</p>
<p><strong>ggplot2</strong></p>
<p><!--[Fast Tube]--><span id="TagaAeIHKks" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-scatter-plots/#TagaAeIHKks"><img src="http://i.ytimg.com/vi/TagaAeIHKks/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>In the <strong>ggplot2</strong> package the <strong>ggplot</strong> function is used to create graphs of all types rather than having a separate function defined for each type of graph. The first argument is adata frame with the data to be plotted and the <strong>aes</strong> argument specifies the aesthetics associated with the graph such as the point symbol, size or colour. In this case the <strong>Year</strong> variable appears on the x axis and the <strong>Population</strong> variable on the y axis. The code to create the scatter plot is shown here:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(uk.df, aes(Year, Population)) + geom_point() +
  xlab(&quot;Year&quot;) + ylab(&quot;Total Population (Thousands)&quot;) +
  opts(title = &quot;UK Population (1992-2009)&quot;)</pre></div></div>

<p>The <strong>geom_point</strong> specifies the type of graph to create (a scatter plot in this situation and this highlights the flexibility of the <strong>ggplot2</strong> package as changing the geom will create a new type of graph) and the labels for the graph are created by adding them to the graph with the <strong>xlab</strong>, <strong>ylab</strong> and <strong>opts</strong> functions. The graph is shown below:</p>
<div id="attachment_920" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/scatterplot-ggplot2.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/scatterplot-ggplot2-300x300.jpg" alt="ggplot2 Scatter plot" title="Scatter plot Example" width="300" height="300" class="size-medium wp-image-920" /></a><p class="wp-caption-text">ggplot2 Scatter plot</p></div>
<p>This graph is not greatly different to the scatter plot created using the <strong>base</strong> and <strong>lattice</strong> packages. The default theme in the <strong>ggplot2</strong> package has a gray background with white grid lines that allows easy visual recognition of graphs created using this package.</p>
<p>This blog post is summarised in a pdf leaflet on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-scatter-plots/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Working with themes in Lattice Graphics</title>
		<link>http://www.wekaleamstudios.co.uk/posts/working-with-themes-in-lattice-graphics/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/working-with-themes-in-lattice-graphics/#comments</comments>
		<pubDate>Mon, 12 Apr 2010 15:39:14 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Lattice Graphics]]></category>
		<category><![CDATA[Trellis Graphics]]></category>
		<category><![CDATA[lattice]]></category>
		<category><![CDATA[theme]]></category>
		<category><![CDATA[trellis]]></category>
		<category><![CDATA[trellis.par.get]]></category>
		<category><![CDATA[trellis.par.set]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=892</guid>
		<description><![CDATA[The Trellis graphics approach provides facilities for creating effective graphs with a consistent look and feel and one of the good things about the system is the use of themes to define the colour, size and other features of the components that make up a graph. The lattice package in R is an implementation of [...]]]></description>
			<content:encoded><![CDATA[<p>The Trellis graphics approach provides facilities for creating effective graphs with a consistent look and feel and one of the good things about the system is the use of themes to define the colour, size and other features of the components that make up a graph. The <strong>lattice</strong> package in <strong>R</strong> is an implementation of the approach and in this post we will consider how to change the default settings.<span id="more-892"></span></p>
<p>The main functions of interest are the pair trellis.par.get and trellis.par.set that are used to get hold of the settings for the current graphics device or to set a list of new parameters. The parameters themselves are described by a list object with information about a large range of possible options. To extract and save this information we would do the following:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">my.theme = trellis.par.get()</pre></div></div>

<p>To get a feel for how many options can be specified we can look at the names of the components that make up this list:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; names(my.theme)
 [1] &quot;grid.pars&quot;         &quot;fontsize&quot;          &quot;background&quot;       
 [4] &quot;clip&quot;              &quot;add.line&quot;          &quot;add.text&quot;         
 [7] &quot;plot.polygon&quot;      &quot;box.dot&quot;           &quot;box.rectangle&quot;    
[10] &quot;box.umbrella&quot;      &quot;dot.line&quot;          &quot;dot.symbol&quot;       
[13] &quot;plot.line&quot;         &quot;plot.symbol&quot;       &quot;reference.line&quot;   
[16] &quot;strip.background&quot;  &quot;strip.shingle&quot;     &quot;strip.border&quot;     
[19] &quot;superpose.line&quot;    &quot;superpose.symbol&quot;  &quot;superpose.polygon&quot;
[22] &quot;regions&quot;           &quot;shade.colors&quot;      &quot;axis.line&quot;        
[25] &quot;axis.text&quot;         &quot;axis.components&quot;   &quot;layout.heights&quot;   
[28] &quot;layout.widths&quot;     &quot;box.3d&quot;            &quot;par.xlab.text&quot;    
[31] &quot;par.ylab.text&quot;     &quot;par.zlab.text&quot;     &quot;par.main.text&quot;    
[34] &quot;par.sub.text&quot;</pre></div></div>

<p>There are a total of 34 components of this list that each have a set of other parameters that can be set by the user. For example, if we wanted to find out how <strong>lattice</strong> will draw a scatter plot then the <strong>plot.symbol</strong> component of the list is where we should be looking. We extract this information as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; my.theme$plot.symbol
$alpha
[1] 1
&nbsp;
$cex
[1] 0.8
&nbsp;
$col
[1] &quot;#0080ff&quot;
&nbsp;
$font
[1] 1
&nbsp;
$pch
[1] 1
&nbsp;
$fill
[1] &quot;transparent&quot;</pre></div></div>

<p>The plot symbol is 80% of the overall size specified for the system and the empty circle is the first shape symbol available &#8211; set by the <strong>pch</strong> component.</p>
<p>When there are groups in the data being plotted different shape symbols, colours etc. can be specified as part of the <strong>superpose.symbol</strong> component of the theme list. By default the settings are:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; my.theme$superpose.symbol
$alpha
[1] 1 1 1 1 1 1 1
&nbsp;
$cex
[1] 0.8 0.8 0.8 0.8 0.8 0.8 0.8
&nbsp;
$col
[1] &quot;#0080ff&quot;   &quot;#ff00ff&quot;   &quot;darkgreen&quot; &quot;#ff0000&quot;   &quot;orange&quot;    &quot;#00ff00&quot;
&quot;brown&quot;    
&nbsp;
$fill
[1] &quot;#CCFFFF&quot; &quot;#FFCCFF&quot; &quot;#CCFFCC&quot; &quot;#FFE5CC&quot; &quot;#CCE6FF&quot; &quot;#FFFFCC&quot; &quot;#FFCCCC&quot;
&nbsp;
$font
[1] 1 1 1 1 1 1 1
&nbsp;
$pch
[1] 1 1 1 1 1 1 1</pre></div></div>

<p>There are seven possible groups that are specified in the default theme and the various options are in many cases the same name as for the <strong>base</strong> graphics system. If we wanted to make the symbols large then the <strong>cex</strong> option specifies a multiplier relative to the base size for symbols. The <strong>pch</strong> is used to indicate which plot symbol to use, and these are again the same as those provided when using <strong>base</strong> graphics.</p>
<p>So if we wanted to change the symbols to solid circles then we would adjust the <strong>my.theme</strong> list object to indicate a different plot symbol shape:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; my.theme$superpose.symbol$pch = rep(16, 7)
&gt; my.theme$superpose.symbol
$alpha
[1] 1 1 1 1 1 1 1
&nbsp;
$cex
[1] 0.8 0.8 0.8 0.8 0.8 0.8 0.8
&nbsp;
$col
[1] &quot;#0080ff&quot;   &quot;#ff00ff&quot;   &quot;darkgreen&quot; &quot;#ff0000&quot;   &quot;orange&quot;    &quot;#00ff00&quot;
&quot;brown&quot;    
&nbsp;
$fill
[1] &quot;#CCFFFF&quot; &quot;#FFCCFF&quot; &quot;#CCFFCC&quot; &quot;#FFE5CC&quot; &quot;#CCE6FF&quot; &quot;#FFFFCC&quot; &quot;#FFCCCC&quot;
&nbsp;
$font
[1] 1 1 1 1 1 1 1
&nbsp;
$pch
[1] 16 16 16 16 16 16 16</pre></div></div>

<p>To update the graphics settings we would need to use the <strong>trellis.par.set</strong> function:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; trellis.par.set(my.theme)</pre></div></div>

<p>The new settings can be accessed via:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; show.settings()</pre></div></div>

<p>They are now as shown here:</p>
<div id="attachment_900" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/lattice-settings.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/lattice-settings-300x300.jpg" alt="Lattice Graphics Settings" title="Lattice Graphics Theme Settings" width="300" height="300" class="size-medium wp-image-900" /></a><p class="wp-caption-text">Lattice Graphics Settings</p></div>
<p>The plot symbols are now solid rather than empty circles.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/working-with-themes-in-lattice-graphics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Summarising data using histograms</title>
		<link>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-histograms/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-histograms/#comments</comments>
		<pubDate>Sun, 11 Apr 2010 08:53:16 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Base Graphics]]></category>
		<category><![CDATA[Exploratory Data Analysis]]></category>
		<category><![CDATA[Grammar of Graphics]]></category>
		<category><![CDATA[Lattice Graphics]]></category>
		<category><![CDATA[Trellis Graphics]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[hist]]></category>
		<category><![CDATA[histogram]]></category>
		<category><![CDATA[lattice]]></category>
		<category><![CDATA[trellis]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=870</guid>
		<description><![CDATA[The histogram is a standard type of graphic used to summarise univariate data where the range of values in the data set is divided into regions and a bar (usually vertical) is plotted in each of these regions with height proportional to the frequency of observations in that region. In some cases the proportion of [...]]]></description>
			<content:encoded><![CDATA[<p>The histogram is a standard type of graphic used to summarise univariate data where the range of values in the data set is divided into regions and a bar (usually vertical) is plotted in each of these regions with height proportional to the frequency of observations in that region. In some cases the proportion of data points in each region is shown instead of counts.<span id="more-870"></span></p>
<p>The shape of the histogram is determined by the width and number of regions that divided up the data. A histogram provides an indication the following features of a set of data: the general shape, symmetry or skewness of data and modality (uni-, bi- or multi-modal). There are some situations where a different type of graph would be preferable but histograms are useful for describing the general features of the distribution of a set of data.</p>
<p>To illustrate creating a histogram we consider data from the AFL sports league in Australia and the total number of points scored by the home team in each fixture. If we assume that the data is in a comma separated text file, called <strong>afl_2003_2007.csv</strong>, then we would import that data using the following command saving the results in a data frame:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">afl.df = read.csv(&quot;afl_2003_2007.csv&quot;)</pre></div></div>

<p>Edit: The data is available as <a href='http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/12/afl_2003_2007.txt'>AFL Data Set</a>. Change the file extension manually to <strong>csv</strong> or change the command to reflect the different file name.</p>
<p><strong>Base Graphics</strong></p>
<p><!--[Fast Tube]--><span id="4Q9vPuj4w8c" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-histograms/#4Q9vPuj4w8c"><img src="http://i.ytimg.com/vi/4Q9vPuj4w8c/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>In <strong>base</strong> graphics the function <strong>hist</strong> is used to create a histogram with the first argument being the name of the vector that contains the data to be plotted. The <strong>x-axis</strong> is given a label using the <strong>xlab</strong> argument and the <strong>main</strong> argument is used to add a title to the graph. Code to create a histogram of home points is shown below:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">hist(afl.df$Home.Total, xlab = &quot;Home Points&quot;,
  main = &quot;Histogram of Points Scored at Home\nAFL 2003-2007&quot;)</pre></div></div>

<p>The default option is to display bars representing the frequency of data values in each of the ranges and the overall look of the graph is basic as shown here:</p>
<div id="attachment_877" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/histogram-base.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/histogram-base-300x300.jpg" alt="Base Graphics Histogram" title="Histogram Example" width="300" height="300" class="size-medium wp-image-877" /></a><p class="wp-caption-text">Base Graphics Histogram</p></div>
<p>The default algorithm for selecting number of bins to use for the histogram usually makes a sensible selection but this can be specified if required.</p>
<p><strong>Lattice Graphics</strong></p>
<p><!--[Fast Tube]--><span id="hxQmEhzgWks" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-histograms/#hxQmEhzgWks"><img src="http://i.ytimg.com/vi/hxQmEhzgWks/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>In the <strong>lattice</strong> graphics package there is a function <strong>histogram</strong> and we make use of the formula to specify a single variable for the number of points scored by the home team. The specification for the axis labels and graph title are the same as for the <strong>base</strong> graphics package. The equivalent graph is created using the following code:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">histogram( ~ Home.Total, data = afl.df, xlab = &quot;Home Points&quot;,
  main = &quot;Histogram of Points Scored at Home\nAFL 2003-2007&quot;)</pre></div></div>

<p>Here the default option is the work with proportions of the total number of data points rather than counts so the shape of the distribution is slightly different when compared to the <strong>base</strong> graphics plot. The <strong>lattice</strong> version is shown below:</p>
<div id="attachment_880" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/histogram-lattice.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/histogram-lattice-300x300.jpg" alt="Lattice Graphics Histogram" title="Histogram Example" width="300" height="300" class="size-medium wp-image-880" /></a><p class="wp-caption-text">Lattice Graphics Histogram</p></div>
<p>The main other difference is the choice of colour for the bars in the histogram and these can be adjusted by changing the global theme for <strong>lattice</strong>.</p>
<p><strong>ggplot2</strong></p>
<p><!--[Fast Tube]--><span id="47kWynt3b6M" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-histograms/#47kWynt3b6M"><img src="http://i.ytimg.com/vi/47kWynt3b6M/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The <strong>ggplot2</strong> library uses a general purpose graphics function called <strong>ggplot</strong> to create graphs of all types and the geom specifies the type of display to create, in this case a histogram. Components that make up the graph are added sequentially to build up the whole plot and in the example below we add axis labels and a main title.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(afl.df, aes(Home.Total)) + geom_histogram() +
  xlab(&quot;Home Points&quot;) + ylab(&quot;Frequency&quot;) +
  opts(title = &quot;Histogram of Points Scored at Home\nAFL 2003-2007&quot;)</pre></div></div>

<p>The default theme for <strong>ggplot2</strong> is distinctive and the histogram is shown in the graph below:</p>
<div id="attachment_881" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/histogram-ggplot2.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/histogram-ggplot2-300x300.jpg" alt="ggplot 2 Histogram" title="Histogram Example" width="300" height="300" class="size-medium wp-image-881" /></a><p class="wp-caption-text">ggplot 2 Histogram</p></div>
<p>The default number of bins is larger compared to <strong>base</strong> and <strong>lattice</strong> graphics which provides a rough distribution in this particular case. The online <a href="http://had.co.nz/ggplot2/">ggplot2</a> manual is a good source of information about customising graphs created using this approach.</p>
<p>This blog post is summarised in a pdf leaflet on the <a href="http://www.wekaleamstudios.co.uk/?page_id=282">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-histograms/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Summarising data using dot plots</title>
		<link>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-dot-plots/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-dot-plots/#comments</comments>
		<pubDate>Fri, 26 Mar 2010 10:53:00 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Base Graphics]]></category>
		<category><![CDATA[Exploratory Data Analysis]]></category>
		<category><![CDATA[Grammar of Graphics]]></category>
		<category><![CDATA[Lattice Graphics]]></category>
		<category><![CDATA[Trellis Graphics]]></category>
		<category><![CDATA[Cleveland]]></category>
		<category><![CDATA[dot plot]]></category>
		<category><![CDATA[dotplot]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[lattice]]></category>
		<category><![CDATA[plot]]></category>
		<category><![CDATA[points]]></category>
		<category><![CDATA[trellis]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=847</guid>
		<description><![CDATA[A dot plot is a type of display that compares counts, frequencies, totals or other summary measures for a series of categories. The dot plot can be arranged with the categories either on the vertical or horizontal axis of the display to allow comparising between the different categories as well as comparison within categories where [...]]]></description>
			<content:encoded><![CDATA[<p>A dot plot is a type of display that compares counts, frequencies, totals or other summary measures for a series of categories. The dot plot can be arranged with the categories either on the vertical or horizontal axis of the display to allow comparising between the different categories as well as comparison within categories where there are multiple symbols used to denote say different years.<span id="more-847"></span></p>
<p>In this post we will considered creating a dot plot using the <strong>base</strong> graphics, <strong>lattice</strong> graphics and <strong>ggplot2</strong> approaches. To illustrate creating a dot plot we used data from the <a href="http://faostat.fao.org">FAO website</a> on the total irrigation area for Africa, Latin America, North America and Europe. We create a data frame using the following code:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">irrigation.df = data.frame(
  Region = rep(c(&quot;Africa&quot;, &quot;Latin America&quot;, &quot;North America&quot;, &quot;Europe&quot;), 4),
  Year = factor(c(rep(1980, 4), rep(1990, 4), rep(2000, 4), rep(2007, 4))),
  Area = c(9.3, 12.7, 21.2, 18.8, 11.0, 15.5, 21.6, 25.3,
    13.2, 17.3, 23.3, 26.7, 13.6, 17.3, 23.8, 26.3)
)</pre></div></div>

<p><strong>Base Graphics</strong></p>
<p><!--[Fast Tube]--><span id="5izUzQKL1yw" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-dot-plots/#5izUzQKL1yw"><img src="http://i.ytimg.com/vi/5izUzQKL1yw/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>In the <strong>base</strong> graphics system we build up the <strong>dotplot</strong> with a series of commands. The first function call creates the graph region based on the data set but we do not plot any data by setting the <strong>type = &#8220;n&#8221;</strong> argument. The axis labels for the horizontal and vertical scales are set along with the title in the initial function call:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">plot(irrigation.df$Area, irrigation.df$Region, xlab = &quot;Area&quot;,
  ylab = &quot;Region&quot;, main = &quot;Irrigation Area by Region&quot;, type = &quot;n&quot;)</pre></div></div>

<p>To add the points with separate colours for each of the four years we use the <strong>points</strong> function and subset to the particular year by testing a condition on the year. The <strong>col</strong> argument is used with a text string to specify the colour for the symbols for the given year:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">points(irrigation.df$Area[irrigation.df$Year == 1980],
  irrigation.df$Region[irrigation.df$Year == 1980], col = &quot;black&quot;, pch = 16)
points(irrigation.df$Area[irrigation.df$Year == 1990],
  irrigation.df$Region[irrigation.df$Year == 1990], col = &quot;blue&quot;, pch = 16)
points(irrigation.df$Area[irrigation.df$Year == 2000],
  irrigation.df$Region[irrigation.df$Year == 2000], col = &quot;red&quot;, pch = 16)
points(irrigation.df$Area[irrigation.df$Year == 2007],
  irrigation.df$Region[irrigation.df$Year == 2007], col = &quot;green&quot;, pch = 16)</pre></div></div>

<p>The code is rather long winded compared to the using the other two graphics packages. We can add a legend to the graph so that the years can be identified:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">legend(10, 4, legend = c(&quot;1980&quot;, &quot;1990&quot;, &quot;2000&quot;, &quot;2007&quot;),
  col = c(&quot;black&quot;, &quot;blue&quot;, &quot;red&quot;, &quot;green&quot;), pch = 16)</pre></div></div>

<p>The placement of the legend uses the <strong>x</strong> and <strong>y</strong> coordinates within the graph to position the box. All the code above produces the following graph:</p>
<div id="attachment_856" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/03/dotplot-base.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/03/dotplot-base-300x300.jpg" alt="Base Graphics Dot Plot" title="Dot Plot Example" width="300" height="300" class="size-medium wp-image-856" /></a><p class="wp-caption-text">Base Graphics Dot Plot</p></div>
<p>The graph is basic but we can consider the changes over time for the four regions. One downside is that the regions have been labelled with numbers rather than text strings.</p>
<p><strong>Lattice Graphics</strong></p>
<p><!--[Fast Tube]--><span id="-FGU6PMaSRY" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-dot-plots/#-FGU6PMaSRY"><img src="http://i.ytimg.com/vi/-FGU6PMaSRY/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The <strong>lattice</strong> graphics package has a function <strong>dotplot</strong> that is used to create dot plots. The first argument to the function is a formula describing the variables to use for the horizontal and vertical axes. We also specify the data frame to use for the graph and which column to determine different symbols and/or colours to highlight groupings within the plot:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">dotplot(Region ~ Area, data = irrigation.df, groups = Year,
  main = &quot;Irrigation Area by Region&quot;)</pre></div></div>

<p>The lattice variant of the graph is shown here:</p>
<div id="attachment_857" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/03/dotplot-lattice.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/03/dotplot-lattice-300x300.jpg" alt="Lattice Graphics Dot Plot" title="Dot Plot Example" width="300" height="300" class="size-medium wp-image-857" /></a><p class="wp-caption-text">Lattice Graphics Dot Plot</p></div>
<p>The graph is simple and very similar to the one produced using the base graphics with the advantage that the R code is not as complicated.</p>
<p><strong>ggplot2</strong></p>
<p><!--[Fast Tube]--><span id="y1CsT-jAWZQ" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-dot-plots/#y1CsT-jAWZQ"><img src="http://i.ytimg.com/vi/y1CsT-jAWZQ/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The <strong>ggplot</strong> function is used to create the dot plot where we first specify the name of the data frame with the information to be displayed and then use the <strong>aes</strong> argument to list the variables to plot on the horizontal and vertical axes. The colour argument determines the variable to use for assigning colours to (usually) a categorical variable.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(irrigation.df, aes(x = Area, y = Region, colour = Year)) +
  geom_point() + opts(title = &quot;Irrigation Area by Region&quot;)</pre></div></div>

<p>The <strong>ggplot2</strong> version of the dot plot is shown below:</p>
<div id="attachment_858" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/03/dotplot-ggplot2.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/03/dotplot-ggplot2-300x300.jpg" alt="ggplot2 Dot Plot" title="Dot Plot Example" width="300" height="300" class="size-medium wp-image-858" /></a><p class="wp-caption-text">ggplot2 Dot Plot</p></div>
<p>This graph is very similar to the ones produced using the other graphics packages but has the distinctive background and legend style that is used as the default option in <strong>ggplot2</strong>.</p>
<p>This blog post is summarised in a pdf leaflet on the <a href="http://www.wekaleamstudios.co.uk/?page_id=282">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-dot-plots/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Summarising data using bar charts</title>
		<link>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-bar-charts/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-bar-charts/#comments</comments>
		<pubDate>Sat, 12 Dec 2009 08:52:33 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Base Graphics]]></category>
		<category><![CDATA[Exploratory Data Analysis]]></category>
		<category><![CDATA[Grammar of Graphics]]></category>
		<category><![CDATA[Lattice Graphics]]></category>
		<category><![CDATA[Trellis Graphics]]></category>
		<category><![CDATA[bar chart]]></category>
		<category><![CDATA[barchart]]></category>
		<category><![CDATA[barplot]]></category>
		<category><![CDATA[FAO]]></category>
		<category><![CDATA[geom_bar]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[lattice]]></category>
		<category><![CDATA[trellis]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=664</guid>
		<description><![CDATA[A bar graph is a frequently used type of display that compares counts, frequencies, totals or other summary measures for a series of categories, e.g. sales in different market sectors or in quarters in a financial year. The bar graph can be laid out with the categories either on the vertical or horizontal axis of [...]]]></description>
			<content:encoded><![CDATA[<p>A bar graph is a frequently used type of display that compares counts, frequencies, totals or other summary measures for a series of categories, e.g. sales in different market sectors or in quarters in a financial year. The bar graph can be laid out with the categories either on the vertical or horizontal axis of the display &#8211; depending on whether we consider making a vertical or horizontal comparison is easier for interpreting the graph.<span id="more-664"></span></p>
<p>In <strong>R</strong> there are multiple ways for creating graphs, including the base graphics, lattice graphics and the ggplot2 grammar of graphics approach. To illustrate how we can create a bar chart using these packages we will make use of some data taken from the <a href="http://faostat.fao.org">FAO</a> statistics website for the UK in 2007. The data is for production (in metric tonnes) of the top five, in terms of production, food and agricultural commodities.</p>
<p>The first step before creating the graphs is to prepare the data in a format that can be used by the graphing functions. As this dataset is small we can manually create the data object. To make the labels on the graph less cluttered the production is recorded as 1,000s of metric tonnes.</p>
<p>The <strong>R</strong> code to create the data object is shown here:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">uk2007 = data.frame(Commodity =
  factor(c(&quot;Cow milk&quot;, &quot;Wheat&quot;, &quot;Sugar beet&quot;, &quot;Potatoes&quot;, &quot;Barley&quot;),
    levels = c(&quot;Cow milk&quot;, &quot;Wheat&quot;, &quot;Sugar beet&quot;, &quot;Potatoes&quot;, &quot;Barley&quot;)),
  Production = c(14023, 13221, 6500, 5635, 5079))</pre></div></div>

<p>The <strong>levels</strong> argument is explicity defined to make sure that the ordering is as required from largest to smallest production rather than being alphabetical which would be how the categories are ordered otherwise.</p>
<p><strong>Base Graphics</strong></p>
<p><!--[Fast Tube]--><span id="fVhdPbntKdw" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-bar-charts/#fVhdPbntKdw"><img src="http://i.ytimg.com/vi/fVhdPbntKdw/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The <strong>base</strong> graphics in R provide a function <strong>barplot</strong> that we can use to create a bar chart. The first argument to the function is the name of the object with the data. The <strong>names</strong> argument is used to provide the labels for the categories in the graph. We also specify the text for the labels for the x-axis, y-axis and title of the graph with the <strong>xlab</strong>, <strong>ylab</strong> and <strong>main</strong> arguments respectively.</p>
<p>The function call is:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">barplot(uk2007\$Production, names = uk2007\$Commodity,
  xlab = &quot;Commodity&quot;, ylab = &quot;Production (1,000 MT)&quot;,
  main = &quot;UK 2007 Top 5 Food and Agricultural Commodities&quot;)</pre></div></div>

<p>to produce the following graph:</p>
<div id="attachment_685" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2009/12/barchart-base1.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2009/12/barchart-base1-300x299.jpg" alt="Base Graphics Bar Chart" title="Barchart Example" width="300" height="299" class="size-medium wp-image-685" /></a><p class="wp-caption-text">Base Graphics Bar Chart</p></div>
<p>This graph is visually appealing with sensible space between the bars for the five commodity categories.</p>
<p><strong>Lattice Graphics</strong></p>
<p><!--[Fast Tube]--><span id="KvQOjlkseBA" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-bar-charts/#KvQOjlkseBA"><img src="http://i.ytimg.com/vi/KvQOjlkseBA/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>In the <strong>lattice</strong> graphics package the <strong>barchart</strong> function is used to create bar charts. The <strong>x</strong> and <strong>y</strong> variables are specified using a formula, which is the standard way when using Trellis graphics. The variable on the vertical axis is specified on the left hand side of the formula and the variable for the horizontal axis is on the right hand side, where they are separated by the tilda character.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">barchart(Production $\sim$ Commodity, data = uk2007, xlab = &quot;Commodity&quot;,
  ylab = &quot;Production (1,000 MT)&quot;,
  main = &quot;UK 2007 Top 5 Food and Agricultural Commodities&quot;)</pre></div></div>

<p>This code produces the following graph:</p>
<div id="attachment_686" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2009/12/barchart-lattice1.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2009/12/barchart-lattice1-300x299.jpg" alt="Lattice Graphics Bar Chart" title="Barchart Example" width="300" height="299" class="size-medium wp-image-686" /></a><p class="wp-caption-text">Lattice Graphics Bar Chart</p></div>
<p>The main visual difference compared to the base graphics example is the default colours for the bars which is much brighter than the base graphics example. There is also a large gap between the bars in the display.</p>
<p><strong>ggplot2</strong></p>
<p><!--[Fast Tube]--><span id="4jSfbKFdrTo" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-bar-charts/#4jSfbKFdrTo"><img src="http://i.ytimg.com/vi/4jSfbKFdrTo/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The create the bar chart in the <strong>ggplot2</strong> package we use the <strong>ggplot</strong> function to specify the data to appear in the graph then gradually add in the other components of the graph. </p>
<p>We specify the data frame where the data is stored and then use the <strong>aes</strong> argument to identify the <strong>x</strong> and <strong>y</strong> variables. The <strong>geom\_bar</strong> function is used to create a bar chart display with the specified data and the last three options in the example are for creating the various labels to be added to the graph.</p>
<p>The graph itself is constructed piece by piece to add the various layers and components on top of the base layer:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(uk2007, aes(Commodity, Production)) + geom_bar() + xlab(&quot;Commodity&quot;) +
  ylab(&quot;Production (1,000 MT)&quot;) +
  opts(title = &quot;UK 2007 Top 5 Food and Agricultural Commodities&quot;)</pre></div></div>

<p>This code produces the following graph:</p>
<div id="attachment_691" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2009/12/barchart-ggplot2.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2009/12/barchart-ggplot2-300x299.jpg" alt="ggplot2 Bar Chart" title="Barchart Example" width="300" height="299" class="size-medium wp-image-691" /></a><p class="wp-caption-text">ggplot2 Bar Chart</p></div>
<p>The layout of this graph differs mainly with the grid background layout, which by default is a gray with white lines.</p>
<p>This blog post is summarised in a pdf leaflet on the <a href="http://www.wekaleamstudios.co.uk/?page_id=282">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-bar-charts/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

