<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Software for Exploratory Data Analysis and Statistical Modelling &#187; Base Graphics</title>
	<atom:link href="http://www.wekaleamstudios.co.uk/topics/r-environment/base-graphics/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.wekaleamstudios.co.uk</link>
	<description>Statistical Modelling with R</description>
	<lastBuildDate>Wed, 01 Feb 2012 19:44:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Creating surface plots</title>
		<link>http://www.wekaleamstudios.co.uk/posts/creating-surface-plots/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/creating-surface-plots/#comments</comments>
		<pubDate>Fri, 28 May 2010 15:07:29 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Base Graphics]]></category>
		<category><![CDATA[Lattice Graphics]]></category>
		<category><![CDATA[Trellis Graphics]]></category>
		<category><![CDATA[box]]></category>
		<category><![CDATA[expand.grid]]></category>
		<category><![CDATA[lattice]]></category>
		<category><![CDATA[loess]]></category>
		<category><![CDATA[persp]]></category>
		<category><![CDATA[predict]]></category>
		<category><![CDATA[surface]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[wireframe]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1132</guid>
		<description><![CDATA[A 3d wireframe plot is a type of graph that is used to display a surface &#8211; geographic data is an example of where this type of graph would be used or it could be used to display a fitted model with more than one explanatory variable. These plots are related to contour plots which [...]]]></description>
			<content:encoded><![CDATA[<p>A 3d wireframe plot is a type of graph that is used to display a surface &#8211; geographic data is an example of where this type of graph would be used or it could be used to display a fitted model with more than one explanatory variable. These plots are related to contour plots which are the two dimensional equivalent.<span id="more-1132"></span></p>
<p>To illustrate this type of graph we will consider some surface elevation data that is available in the <strong>geoR</strong> package and was used in the blog <a href="http://www.wekaleamstudios.co.uk/posts/displaying-data-using-level-plots/">post</a> on level plots. The data set in this package is called <strong>elevation</strong> and stores the elevation height in feet (as multiples of ten feet) for a grid region of x and y coordinates (recorded as multiples of 50 feet). This post has details of the various operations that are undertaken to prepare the data for graphing.</p>
<p><strong>Base Graphics</strong></p>
<p><!--[Fast Tube]--><span id="sEsDeE-CsHg" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/creating-surface-plots/#sEsDeE-CsHg"><img src="http://i.ytimg.com/vi/sEsDeE-CsHg/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The function <strong>persp</strong> is the <strong>base</strong> graphics function for creating wireframe surface plots. The <strong>persp</strong> function requires a list of x and y values covering the grid of vertical values which is specified as the <strong>z</strong> variable. The heights for the display are specified as a table of values which we saved previously as the object <strong>z</strong> during the calculations when the local trend surface model was fitted to the data. The text on the axis labels are specified by the <strong>xlab</strong> and <strong>ylab</strong> function arguments and the <strong>main</strong> argument determines the overall title for the graph.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">persp(seq(10, 300, 5), seq(10, 300, 5), z, phi = 45, theta = 45,
  xlab = &quot;X Coordinate (feet)&quot;, ylab = &quot;Y Coordinate (feet)&quot;,
  main = &quot;Surface elevation data&quot;
)</pre></div></div>

<p>The function arguments <strong>phi</strong> and <strong>theta</strong> are used to rotate the viewing angle of the surface. Trial and error is probably the way to go when setting these as good choices depend entirely on the shape of the surface being displayed.</p>
<div id="attachment_1138" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/05/surface-base.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/05/surface-base-300x300.jpg" alt="Base Graphics Surface Plot" title="Surface Plot Example" width="300" height="300" class="size-medium wp-image-1138" /></a><p class="wp-caption-text">Base Graphics Surface Plot</p></div>
<p>The surface is clear and easy to determine the shape and variation in height across the <strong>x</strong> and <strong>y</strong> grid coordinates.</p>
<p><strong>Lattice Graphics</strong></p>
<p><!--[Fast Tube]--><span id="9mzSsIgKCZg" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/creating-surface-plots/#9mzSsIgKCZg"><img src="http://i.ytimg.com/vi/9mzSsIgKCZg/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The <strong>lattice</strong> graphics package has a function <strong>wireframe</strong> and we use the data in the object <strong>elevation.fit</strong> to create the graph. We use the formula interface to specify first the z axis data (the heights) followed by the two variables specifying the <strong>x</strong> and <strong>y</strong> axis coordinates for the data.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">wireframe(Height ~ x*y, data = elevation.fit,
  xlab = &quot;X Coordinate (feet)&quot;, ylab = &quot;Y Coordinate (feet)&quot;,
  main = &quot;Surface elevation data&quot;,
  drape = TRUE,
  colorkey = TRUE,
  screen = list(z = -60, x = -60)
)</pre></div></div>

<p>The axes labels and title are specified in the same way as the <strong>base</strong> graphics with the <strong>xlab</strong>, <strong>ylab</strong> and <strong>main</strong> function arguments. A colour key is added using the <strong>colorkey</strong> function argument and setting it to <strong>TRUE</strong>.</p>
<div id="attachment_1139" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/05/surface-lattice.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/05/surface-lattice-300x300.jpg" alt="Lattice Graphics Surface Plot" title="Surface Plot Example" width="300" height="300" class="size-medium wp-image-1139" /></a><p class="wp-caption-text">Lattice Graphics Surface Plot</p></div>
<p>The surface produced by the <strong>wireframe</strong> function is similar to the <strong>persp</strong> function with the main difference between the colours used on the surface.</p>
<p>This blog post is summarised in a pdf leaflet on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/creating-surface-plots/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Displaying data using level plots</title>
		<link>http://www.wekaleamstudios.co.uk/posts/displaying-data-using-level-plots/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/displaying-data-using-level-plots/#comments</comments>
		<pubDate>Mon, 03 May 2010 10:17:08 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Base Graphics]]></category>
		<category><![CDATA[Exploratory Data Analysis]]></category>
		<category><![CDATA[Grammar of Graphics]]></category>
		<category><![CDATA[Lattice Graphics]]></category>
		<category><![CDATA[Trellis Graphics]]></category>
		<category><![CDATA[box]]></category>
		<category><![CDATA[expand.grid]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[image]]></category>
		<category><![CDATA[lattice]]></category>
		<category><![CDATA[levelplot]]></category>
		<category><![CDATA[loess]]></category>
		<category><![CDATA[predict]]></category>
		<category><![CDATA[surface]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1008</guid>
		<description><![CDATA[A level plot is a type of graph that is used to display a surface in two rather than three dimensions &#8211; the surface is viewed from above as if we were looking straight down and is an alternative to a contour plot &#8211; geographic data is an example of where this type of graph [...]]]></description>
			<content:encoded><![CDATA[<p>A level plot is a type of graph that is used to display a surface in two rather than three dimensions &#8211; the surface is viewed from above as if we were looking straight down and is an alternative to a contour plot &#8211; geographic data is an example of where this type of graph would be used. A contour plot uses lines to identify regions of different heights and the level plot uses coloured regions to produce a similar effect.<span id="more-1008"></span></p>
<p>To illustrate this type of graph we will consider some surface elevation data that is available in the <strong>geoR</strong> package. The data set in this package is called <strong>elevation</strong> and stores the elevation height in feet (as multiples of ten feet) for a grid region of x and y coordinates (recorded as multiples of 50 feet). To access this data we load the <strong>geoR</strong> pacakage and then use the <strong>data</strong> function:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">require(geoR)
data(elevation)</pre></div></div>

<p>For some packages we need the call to the <strong>data</strong> function to make a set of data available for our use. The <strong>elevation</strong> object is not a data frame so our first step is to create our own data frame to be used to create the level plots using the different graphics packages.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">elevation.df = data.frame(x = 50 * elevation$coords[,&quot;x&quot;],
  y = 50 * elevation$coords[,&quot;y&quot;], z = 10 * elevation$data)</pre></div></div>

<p>We extract the x and y grid coordinates and the height values, multiplying them by 50 and 10 respectively to convert to feet for the graphs. Rather than trying to plot the individual values we need to create a surface to cover the whole grid region as the points themselves are too sparse. We make use of the <strong>loess</strong> function to fit a local polynomial trend surface (using weighted least squares) to approximate the elevation across the whole region. The function call for a local quadratic surface is shown below:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">elevation.loess = loess(z ~ x*y, data = elevation.df,
  degree = 2, span = 0.25)</pre></div></div>

<p>The next stage is to extract heights from this fitted surface at regular intervals across the whole grid region of interest &#8211; which runs from 10 to 300 feet in both the x and y directions. The <strong>expand.grid</strong> function creates an array of all combinations of the x and y values that we specify in a list. We choose a range every foot from 10 to 300 feet to create a fine grid:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">elevation.fit = expand.grid(list(x = seq(10, 300, 1), y = seq(10, 300, 1)))</pre></div></div>

<p>The <strong>predict</strong> function is then used to estimate the surface height at all of these combinations of x and y coordinates covering our grid region. This is saved as an object <strong>z</strong> which will be used by the <strong>base</strong> graphics function:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">z = predict(elevation.loess, newdata = elevation.fit)</pre></div></div>

<p>The <strong>lattice</strong> and <strong>ggplot2</strong> expect the data in a different format so we make use of the <strong>as.numeric</strong> function to convert from a table of heights to a single column and append to the object we create based on all combinations of x and y coordinates:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">elevation.fit$Height = as.numeric(z)</pre></div></div>

<p>The data is now in a format that can be used to create the level plots in the various packages.</p>
<p><strong>Base Graphics</strong></p>
<p>The function <strong>image</strong> in the <strong>base</strong> graphics package is the function we use to create a level plot. This function requires a list of x and y values that cover the grid of vertical values that will be used to create the surface. These heights are specified as a table of values, which in our case was saved as the object <strong>z</strong> during the calculations on the local trend surface.</p>
<p>The text on the axis labels are specified by the <strong>xlab</strong> and <strong>ylab</strong> function arguments and the <strong>main</strong> argument determines the overall title for the graph. The function call below creates the level plot:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">image(seq(10, 300, 1), seq(10, 300, 1), z,
  xlab = &quot;X Coordinate (feet)&quot;, ylab = &quot;Y Coordinate (feet)&quot;,
  main = &quot;Surface elevation data&quot;)
box()</pre></div></div>

<p>After the <strong>image</strong> function is used we call the <strong>box</strong> function mainly for aesthetic purposes to ensure there is a line surrounding the level plot. The graph that is created is shown below:</p>
<div id="attachment_1012" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/05/levelplot-base.jpg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/05/levelplot-base-300x300.jpg" alt="Base Graphics Level Plot" title="Level plot Example" width="300" height="300" class="size-medium wp-image-1012" /></a><p class="wp-caption-text">Base Graphics Level Plot</p></div>
<p>The default colour scheme used by the <strong>base</strong> graphics produces an attractive level plot graph where we can easily see the variation in height across the grid region. It is basically a fancy version of a contour plot where the regions between the contour lines are coloured with different shades indicating the height in those regions.</p>
<p><strong>Lattice Graphics</strong></p>
<p>The <strong>lattice</strong> graphics package provides a function <strong>levelplot</strong> for this type of graphical dispaly. We use the data stored in the object <strong>elevation.fit</strong> to create the graph with <strong>lattice</strong> graphics.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">levelplot(Height ~ x*y, data = elevation.fit,
  xlab = &quot;X Coordinate (feet)&quot;, ylab = &quot;Y Coordinate (feet)&quot;,
  main = &quot;Surface elevation data&quot;,
  col.regions = terrain.colors(100)
)</pre></div></div>

<p>The formula is used to specify which variable to use for the three axes and a data frame where the values are stored &#8211; as there are three dimensions it is the z axis that is specified on the left hand side of the formula. The axes labels and title are specified in the same way as the <strong>base</strong> graphics.</p>
<p>The range of colours used in the <strong>lattice</strong> level plot can be specified as a vector of colours to the <strong>col.regions</strong> argument of the function. We make use of the <strong>terrian.colors</strong> function to create this vector which a range of 100 colours which are less striking than those used above with the <strong>base</strong> graphics. The level plot that we can is shown here:</p>
<div id="attachment_1014" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/05/levelplot-lattice.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/05/levelplot-lattice-300x300.jpg" alt="Lattice Graphics Level Plot" title="Level plot Example" width="300" height="300" class="size-medium wp-image-1014" /></a><p class="wp-caption-text">Lattice Graphics Level Plot</p></div>
<p>This is in general similar to the <strong>base</strong> graphics display but the actual plot region is a different shape that makes things look slightly different.</p>
<p><strong>ggplot2</strong></p>
<p>The <strong>ggplot2</strong> package also provides facilities for creating a level plot making use of the tile geom to create the desired graph. The function <strong>ggplot</strong> forms the basis of the graph and various other options are used to customise the graph:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(elevation.fit, aes(x, y, fill = Height)) + geom_tile() +
  xlab(&quot;X Coordinate (feet)&quot;) + ylab(&quot;Y Coordinate (feet)&quot;) +
  opts(title = &quot;Surface elevation data&quot;) +
  scale_fill_gradient(limits = c(7000, 10000),low = &quot;black&quot;,high = &quot;white&quot;) +
  scale_x_continuous(expand = c(0,0)) +
  scale_y_continuous(expand = c(0,0))</pre></div></div>

<p>This large number of options that are added to the graph change various settings. The choice of colours for the heights used on graph is selected by the <strong>scale_fill_gradient</strong> function with colours ranging from black to white. The <strong>scale_x_continuous</strong> and <strong>scale_y_continuous</strong> options are used to stretch the tiles to cover the whole grid region covering up the default gray background &#8211; this makes the graph more visually appealing. The graph that is produced is shown here:</p>
<div id="attachment_1013" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/05/levelplot-ggplot2.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/05/levelplot-ggplot2-300x300.jpg" alt="ggplot2 Level Plot" title="Level plot Example" width="300" height="300" class="size-medium wp-image-1013" /></a><p class="wp-caption-text">ggplot2 Level Plot</p></div>
<p>The graph from <strong>ggplot2</strong> is visually as impressive as the other graphs &#8211; there is more smoothing between the colours which blurs some of the lines on the other graphs because of the type of colour gradient that was selected.</p>
<p>This blog post is summarised in a pdf leaflet on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/displaying-data-using-level-plots/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Summarising data using box and whisker plots</title>
		<link>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-box-and-whisker-plots/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-box-and-whisker-plots/#comments</comments>
		<pubDate>Sun, 25 Apr 2010 07:37:10 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Base Graphics]]></category>
		<category><![CDATA[Exploratory Data Analysis]]></category>
		<category><![CDATA[Grammar of Graphics]]></category>
		<category><![CDATA[Lattice Graphics]]></category>
		<category><![CDATA[Trellis Graphics]]></category>
		<category><![CDATA[Box and Whisker]]></category>
		<category><![CDATA[boxplot]]></category>
		<category><![CDATA[bwplot]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[lattice]]></category>
		<category><![CDATA[trellis]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=960</guid>
		<description><![CDATA[A box and whisker plot is a type of graphical display that can be used to summarise a set of data based on the five number summary of this data. The summary statistics used to create a box and whisker plot are the median of the data, the lower and upper quartiles (25% and 75%) [...]]]></description>
			<content:encoded><![CDATA[<p>A box and whisker plot is a type of graphical display that can be used to summarise a set of data based on the five number summary of this data. The summary statistics used to create a box and whisker plot are the median of the data, the lower and upper quartiles (25% and 75%) and the minimum and maximum values.<span id="more-960"></span></p>
<p>The box and whisker plot is an effective way to investigate the distribution of a set of data. For example, skewness can be identified from the box and whisker as the display does not make any assumptions about the underlying distribution of the data. The extreme values at either end of the scale are sometimes included on the display to show how far they extend beyond the majority of the data.</p>
<p>To illustrate creating box and whisker plots we consider UK meteorological data that has been collected on a monthly basis at Southampton, UK between 1950 and 1999 and is publicly available. This data is available from the <a href="http://www.metoffice.gov.uk/">UK Met Office</a> and we will compare the range of temperatures recorded in each month of the year over this period by creating box and whisker plots with the different packages.</p>
<p>The data is assumed to have been imported into <strong>R</strong> and stored in a data frame called <strong>soton.df</strong>. An extract of the data is shown here:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">    Year Month Max.Temp Min.Temp Frost  Rain
1   1950   Jan      7.7      2.8     7  20.1
2   1950   Feb     10.3        4     4 127.0
3   1950   Mar     13.0      4.5     2  39.4
4   1950   Apr     13.6      4.7     0  62.0
5   1950   May     17.9      7.8     0  32.2</pre></div></div>

<p><strong>Base Graphics</strong></p>
<p><!--[Fast Tube]--><span id="Pe-48TAtBho" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-box-and-whisker-plots/#Pe-48TAtBho"><img src="http://i.ytimg.com/vi/Pe-48TAtBho/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The <strong>base</strong> graphics approach makes use of the <strong>boxplot</strong> function to create box and whisker plots. In this situation the function can be used with a formula rather than specifying two separate vectors of data &#8211; we can specify a data frame to point towards a source of data to be used in the graph. For the temperature data we use this code:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">boxplot(Max.Temp ~ Month, data = soton.df,
  xlab = &quot;Month&quot;, ylab = &quot;Maximum Temperature&quot;,
  main = &quot;Temperature at Southampton Weather Station (1950-1999)&quot;
)</pre></div></div>

<p>The horizontal and vertical axes labels are specified using the <strong>xlab</strong> and <strong>ylab</strong> arguments respectively and the title of the plot is created using the <strong>main</strong> argument. The box and whisker plot is shown here:</p>
<div id="attachment_962" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/boxwhisker-base.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/boxwhisker-base-300x300.jpg" alt="Base Graphics Box and Whisker Plot" title="Box and Whisker plot Example" width="300" height="300" class="size-medium wp-image-962" /></a><p class="wp-caption-text">Base Graphics Box and Whisker Plot</p></div>
<p>The function <strong>boxplot</strong> makes it easy to create a reasonably attractive box and whisker plot. The variation in the distribution of temperatures across the year can be seen from the graph.</p>
<p><strong>Lattice Graphics</strong></p>
<p><!--[Fast Tube]--><span id="RJcZ_7EOzv8" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-box-and-whisker-plots/#RJcZ_7EOzv8"><img src="http://i.ytimg.com/vi/RJcZ_7EOzv8/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>In the <strong>lattice</strong> graphics package there is a function <strong>bwplot</strong> which is used to create box and whisker plots. The function call also uses a formula to specify the <strong>x</strong> and <strong>y</strong> variables to use on the graph. The function call arguments are identical to the <strong>boxplot</strong> function in <strong>base</strong> graphics:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">bwplot(Max.Temp ~ Month, data = soton.df,
  xlab = &quot;Month&quot;, ylab = &quot;Maximum Temperature&quot;,
  main = &quot;Temperature at Southampton Weather Station (1950-1999)&quot;
)</pre></div></div>

<p>The variable <strong>Month</strong> is categorical so a separate box and whisker summary is created for each month separately. The <strong>lattice</strong> version of the graph is shown here:</p>
<div id="attachment_963" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/boxwhisker-lattice.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/boxwhisker-lattice-300x300.jpg" alt="Lattice Graphics Box and Whisker Plot" title="Box and Whisker plot Example" width="300" height="300" class="size-medium wp-image-963" /></a><p class="wp-caption-text">Lattice Graphics Box and Whisker Plot</p></div>
<p>This is very similar to the box and whisker plot created by <strong>base</strong> graphics with a similar level of effort required. The main difference is the use of a circle rather than a line to identify the location of the median of the data.</p>
<p><strong>ggplot2</strong></p>
<p><!--[Fast Tube]--><span id="WJQdYId2TUA" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-box-and-whisker-plots/#WJQdYId2TUA"><img src="http://i.ytimg.com/vi/WJQdYId2TUA/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>In the <strong>ggplot2</strong> package there is a general function <strong>ggplot</strong> that is used to create graphs of any type. We make use of the boxplot geom to create a box and whisker plot following the standard approach. The first step is to specify a data frame to use to create the graph and then map the columns of this data frame, via the \texttt{aes} argument, to the different axes or other aesthetics (such as colour or symbol shape). The particular geom is used to specify the type of plot that we want to create. Our final step is to add on the various axes labels and an overall title to the graph.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(soton.df, aes(Month, Max.Temp)) + geom_boxplot() +
  ylab(&quot;Maximum Temperature&quot;) +
  opts(title = &quot;Temperature at Southampton Weather Station (1950-1999)&quot;)</pre></div></div>

<p>The <strong>ggplot2</strong> version of box and whisker plots is shown here:</p>
<div id="attachment_964" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/boxwhisker-ggplot2.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/boxwhisker-ggplot2-300x300.jpg" alt="ggplot2 Graphics Box and Whisker Plot" title="Box and Whisker plot Example" width="300" height="300" class="size-medium wp-image-964" /></a><p class="wp-caption-text">ggplot2 Graphics Box and Whisker Plot</p></div>
<p>The distinctive gray background used by <strong>ggplot2</strong> is an obvious visual difference compared to the default clear background used in the other two approaches. The boxes themselves have a cleaner look in this graph than the other two methods and the overall look is slick.</p>
<p>This blog post is summarised in a pdf leaflet on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-box-and-whisker-plots/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Summarising data using scatter plots</title>
		<link>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-scatter-plots/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-scatter-plots/#comments</comments>
		<pubDate>Sun, 18 Apr 2010 18:56:06 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Base Graphics]]></category>
		<category><![CDATA[Exploratory Data Analysis]]></category>
		<category><![CDATA[Grammar of Graphics]]></category>
		<category><![CDATA[Lattice Graphics]]></category>
		<category><![CDATA[Trellis Graphics]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[lattice]]></category>
		<category><![CDATA[plot]]></category>
		<category><![CDATA[scatter plot]]></category>
		<category><![CDATA[trellis]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[xyplot]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=912</guid>
		<description><![CDATA[A scatter plot is a graph used to investigate the relationship between two variables in a data set. The x and y axes are used for the values of the two variables and a symbol on the graph represents the combination for each pair of values in the data set. This type of graph is [...]]]></description>
			<content:encoded><![CDATA[<p>A scatter plot is a graph used to investigate the relationship between two variables in a data set. The x and y axes are used for the values of the two variables and a symbol on the graph represents the combination for each pair of values in the data set. This type of graph is used in many common situations and can convey a lot of useful information.<span id="more-912"></span></p>
<p>To illustrate creating a scatter plot we will use a simple data set for the population of the UK between 1992 and 2009. This data is saved in a data frame <strong>uk.df</strong> using the following command:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">uk.df = data.frame(Year = 1992:2009,
  Population = c(57770, 57933, 58096, 58258, 58418, 58577,
  58743, 58925, 59131, 59363, 59618, 59894, 60186, 60489,
  60804, 61129, 61461, 61796)
)</pre></div></div>

<p>For this example the data is recorded in thousands to make the graph easier to read and there is no benefit or noticeable improvement to be seen by using greater detail.</p>
<p><strong>Base Graphics</strong></p>
<p><!--[Fast Tube]--><span id="aqXuiQR4bnY" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-scatter-plots/#aqXuiQR4bnY"><img src="http://i.ytimg.com/vi/aqXuiQR4bnY/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>In the <strong>base</strong> graphics system the general purpose <strong>plot</strong> function can be used to create a scatter plot for the UK population data set that we created. The first two arguments to the <strong>plot</strong> function are the x and y variables respectively. The following code will create a scatter plot, including various labels:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">plot(uk.df$Year, uk.df$Population,
  xlab = &quot;Year&quot;, ylab = &quot;Total Population (Thousands)&quot;,
  main = &quot;UK Population (1992-2009)&quot;, pch = 16)</pre></div></div>

<p>The labels for the x and y axes are specified via the <strong>xlab</strong> and <strong>ylab</strong> arguments to the plot function and the <strong>main</strong> argument specifies the title for the plot.</p>
<div id="attachment_919" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/scatterplot-base.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/scatterplot-base-300x300.jpg" alt="Base Graphics Histogram" title="Scatter plot Example" width="300" height="300" class="size-medium wp-image-919" /></a><p class="wp-caption-text">Base Graphics Histogram</p></div>
<p>The graph itself is plain and functional which solid circles indicating the population (in thousands) for each of the years covered by the data.</p>
<p><strong>Lattice Graphics</strong></p>
<p><!--[Fast Tube]--><span id="NMTCIViCLOU" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-scatter-plots/#NMTCIViCLOU"><img src="http://i.ytimg.com/vi/NMTCIViCLOU/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The <strong>lattice</strong> graphics package provides a function <strong>xyplot</strong> specifically to create scatter plots and the function is used in a similar way to the <strong>base</strong> graphics approach. The first argument to the function is a formula describing the relationship to be plotted on the graph, with the y variable preceding the x variable as we are used to when describing mathematical fomula such as y=a+bx. The data frame is specified with the <strong>data</strong> argument to simplify the expression in the formula. The code used is as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">xyplot(Population ~ Year, data = uk.df,
  xlab = &quot;Year&quot;, ylab = &quot;Total Population (Thousands)&quot;,
  main = &quot;UK Population (1992-2009)&quot;,
  scales = list(x = list(at = seq(1992, 2009, 2)))
)</pre></div></div>

<p>The axis labels and the overall title for the graph are specified in the same way as the <strong>base</strong> graphics system. We indulge in some fine tuning of the labels on the x axis via the <strong>scales</strong> argument &#8211; here we indicate that every second year should be included on the label starting in 1992 and running until 2009. The <strong>lattice</strong> graph is shown here for comparison with the graphs created using the other two packages:</p>
<div id="attachment_921" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/scatterplot-lattice.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/scatterplot-lattice-300x300.jpg" alt="Lattice Graphics Scatter Plot" title="Scatter plot Example" width="300" height="300" class="size-medium wp-image-921" /></a><p class="wp-caption-text">Lattice Graphics Scatter Plot</p></div>
<p>There are very few visual differences between the <strong>lattice</strong> and <strong>base</strong> graphics. In <strong>lattice</strong> graphics an object is created that can be edited to add or remove components and then printed to the screen. This approach is more flexible than the base graphics where the components are painted on top of each other and the use of themes in <strong>lattice</strong> will make it easier to keep a consistent look to all graphs in a document.</p>
<p><strong>ggplot2</strong></p>
<p><!--[Fast Tube]--><span id="TagaAeIHKks" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-scatter-plots/#TagaAeIHKks"><img src="http://i.ytimg.com/vi/TagaAeIHKks/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>In the <strong>ggplot2</strong> package the <strong>ggplot</strong> function is used to create graphs of all types rather than having a separate function defined for each type of graph. The first argument is adata frame with the data to be plotted and the <strong>aes</strong> argument specifies the aesthetics associated with the graph such as the point symbol, size or colour. In this case the <strong>Year</strong> variable appears on the x axis and the <strong>Population</strong> variable on the y axis. The code to create the scatter plot is shown here:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(uk.df, aes(Year, Population)) + geom_point() +
  xlab(&quot;Year&quot;) + ylab(&quot;Total Population (Thousands)&quot;) +
  opts(title = &quot;UK Population (1992-2009)&quot;)</pre></div></div>

<p>The <strong>geom_point</strong> specifies the type of graph to create (a scatter plot in this situation and this highlights the flexibility of the <strong>ggplot2</strong> package as changing the geom will create a new type of graph) and the labels for the graph are created by adding them to the graph with the <strong>xlab</strong>, <strong>ylab</strong> and <strong>opts</strong> functions. The graph is shown below:</p>
<div id="attachment_920" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/scatterplot-ggplot2.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/scatterplot-ggplot2-300x300.jpg" alt="ggplot2 Scatter plot" title="Scatter plot Example" width="300" height="300" class="size-medium wp-image-920" /></a><p class="wp-caption-text">ggplot2 Scatter plot</p></div>
<p>This graph is not greatly different to the scatter plot created using the <strong>base</strong> and <strong>lattice</strong> packages. The default theme in the <strong>ggplot2</strong> package has a gray background with white grid lines that allows easy visual recognition of graphs created using this package.</p>
<p>This blog post is summarised in a pdf leaflet on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-scatter-plots/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Summarising data using histograms</title>
		<link>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-histograms/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-histograms/#comments</comments>
		<pubDate>Sun, 11 Apr 2010 08:53:16 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Base Graphics]]></category>
		<category><![CDATA[Exploratory Data Analysis]]></category>
		<category><![CDATA[Grammar of Graphics]]></category>
		<category><![CDATA[Lattice Graphics]]></category>
		<category><![CDATA[Trellis Graphics]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[hist]]></category>
		<category><![CDATA[histogram]]></category>
		<category><![CDATA[lattice]]></category>
		<category><![CDATA[trellis]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=870</guid>
		<description><![CDATA[The histogram is a standard type of graphic used to summarise univariate data where the range of values in the data set is divided into regions and a bar (usually vertical) is plotted in each of these regions with height proportional to the frequency of observations in that region. In some cases the proportion of [...]]]></description>
			<content:encoded><![CDATA[<p>The histogram is a standard type of graphic used to summarise univariate data where the range of values in the data set is divided into regions and a bar (usually vertical) is plotted in each of these regions with height proportional to the frequency of observations in that region. In some cases the proportion of data points in each region is shown instead of counts.<span id="more-870"></span></p>
<p>The shape of the histogram is determined by the width and number of regions that divided up the data. A histogram provides an indication the following features of a set of data: the general shape, symmetry or skewness of data and modality (uni-, bi- or multi-modal). There are some situations where a different type of graph would be preferable but histograms are useful for describing the general features of the distribution of a set of data.</p>
<p>To illustrate creating a histogram we consider data from the AFL sports league in Australia and the total number of points scored by the home team in each fixture. If we assume that the data is in a comma separated text file, called <strong>afl_2003_2007.csv</strong>, then we would import that data using the following command saving the results in a data frame:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">afl.df = read.csv(&quot;afl_2003_2007.csv&quot;)</pre></div></div>

<p>Edit: The data is available as <a href='http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/12/afl_2003_2007.txt'>AFL Data Set</a>. Change the file extension manually to <strong>csv</strong> or change the command to reflect the different file name.</p>
<p><strong>Base Graphics</strong></p>
<p><!--[Fast Tube]--><span id="4Q9vPuj4w8c" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-histograms/#4Q9vPuj4w8c"><img src="http://i.ytimg.com/vi/4Q9vPuj4w8c/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>In <strong>base</strong> graphics the function <strong>hist</strong> is used to create a histogram with the first argument being the name of the vector that contains the data to be plotted. The <strong>x-axis</strong> is given a label using the <strong>xlab</strong> argument and the <strong>main</strong> argument is used to add a title to the graph. Code to create a histogram of home points is shown below:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">hist(afl.df$Home.Total, xlab = &quot;Home Points&quot;,
  main = &quot;Histogram of Points Scored at Home\nAFL 2003-2007&quot;)</pre></div></div>

<p>The default option is to display bars representing the frequency of data values in each of the ranges and the overall look of the graph is basic as shown here:</p>
<div id="attachment_877" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/histogram-base.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/histogram-base-300x300.jpg" alt="Base Graphics Histogram" title="Histogram Example" width="300" height="300" class="size-medium wp-image-877" /></a><p class="wp-caption-text">Base Graphics Histogram</p></div>
<p>The default algorithm for selecting number of bins to use for the histogram usually makes a sensible selection but this can be specified if required.</p>
<p><strong>Lattice Graphics</strong></p>
<p><!--[Fast Tube]--><span id="hxQmEhzgWks" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-histograms/#hxQmEhzgWks"><img src="http://i.ytimg.com/vi/hxQmEhzgWks/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>In the <strong>lattice</strong> graphics package there is a function <strong>histogram</strong> and we make use of the formula to specify a single variable for the number of points scored by the home team. The specification for the axis labels and graph title are the same as for the <strong>base</strong> graphics package. The equivalent graph is created using the following code:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">histogram( ~ Home.Total, data = afl.df, xlab = &quot;Home Points&quot;,
  main = &quot;Histogram of Points Scored at Home\nAFL 2003-2007&quot;)</pre></div></div>

<p>Here the default option is the work with proportions of the total number of data points rather than counts so the shape of the distribution is slightly different when compared to the <strong>base</strong> graphics plot. The <strong>lattice</strong> version is shown below:</p>
<div id="attachment_880" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/histogram-lattice.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/histogram-lattice-300x300.jpg" alt="Lattice Graphics Histogram" title="Histogram Example" width="300" height="300" class="size-medium wp-image-880" /></a><p class="wp-caption-text">Lattice Graphics Histogram</p></div>
<p>The main other difference is the choice of colour for the bars in the histogram and these can be adjusted by changing the global theme for <strong>lattice</strong>.</p>
<p><strong>ggplot2</strong></p>
<p><!--[Fast Tube]--><span id="47kWynt3b6M" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-histograms/#47kWynt3b6M"><img src="http://i.ytimg.com/vi/47kWynt3b6M/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The <strong>ggplot2</strong> library uses a general purpose graphics function called <strong>ggplot</strong> to create graphs of all types and the geom specifies the type of display to create, in this case a histogram. Components that make up the graph are added sequentially to build up the whole plot and in the example below we add axis labels and a main title.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(afl.df, aes(Home.Total)) + geom_histogram() +
  xlab(&quot;Home Points&quot;) + ylab(&quot;Frequency&quot;) +
  opts(title = &quot;Histogram of Points Scored at Home\nAFL 2003-2007&quot;)</pre></div></div>

<p>The default theme for <strong>ggplot2</strong> is distinctive and the histogram is shown in the graph below:</p>
<div id="attachment_881" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/histogram-ggplot2.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/04/histogram-ggplot2-300x300.jpg" alt="ggplot 2 Histogram" title="Histogram Example" width="300" height="300" class="size-medium wp-image-881" /></a><p class="wp-caption-text">ggplot 2 Histogram</p></div>
<p>The default number of bins is larger compared to <strong>base</strong> and <strong>lattice</strong> graphics which provides a rough distribution in this particular case. The online <a href="http://had.co.nz/ggplot2/">ggplot2</a> manual is a good source of information about customising graphs created using this approach.</p>
<p>This blog post is summarised in a pdf leaflet on the <a href="http://www.wekaleamstudios.co.uk/?page_id=282">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-histograms/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Summarising data using dot plots</title>
		<link>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-dot-plots/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-dot-plots/#comments</comments>
		<pubDate>Fri, 26 Mar 2010 10:53:00 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Base Graphics]]></category>
		<category><![CDATA[Exploratory Data Analysis]]></category>
		<category><![CDATA[Grammar of Graphics]]></category>
		<category><![CDATA[Lattice Graphics]]></category>
		<category><![CDATA[Trellis Graphics]]></category>
		<category><![CDATA[Cleveland]]></category>
		<category><![CDATA[dot plot]]></category>
		<category><![CDATA[dotplot]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[lattice]]></category>
		<category><![CDATA[plot]]></category>
		<category><![CDATA[points]]></category>
		<category><![CDATA[trellis]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=847</guid>
		<description><![CDATA[A dot plot is a type of display that compares counts, frequencies, totals or other summary measures for a series of categories. The dot plot can be arranged with the categories either on the vertical or horizontal axis of the display to allow comparising between the different categories as well as comparison within categories where [...]]]></description>
			<content:encoded><![CDATA[<p>A dot plot is a type of display that compares counts, frequencies, totals or other summary measures for a series of categories. The dot plot can be arranged with the categories either on the vertical or horizontal axis of the display to allow comparising between the different categories as well as comparison within categories where there are multiple symbols used to denote say different years.<span id="more-847"></span></p>
<p>In this post we will considered creating a dot plot using the <strong>base</strong> graphics, <strong>lattice</strong> graphics and <strong>ggplot2</strong> approaches. To illustrate creating a dot plot we used data from the <a href="http://faostat.fao.org">FAO website</a> on the total irrigation area for Africa, Latin America, North America and Europe. We create a data frame using the following code:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">irrigation.df = data.frame(
  Region = rep(c(&quot;Africa&quot;, &quot;Latin America&quot;, &quot;North America&quot;, &quot;Europe&quot;), 4),
  Year = factor(c(rep(1980, 4), rep(1990, 4), rep(2000, 4), rep(2007, 4))),
  Area = c(9.3, 12.7, 21.2, 18.8, 11.0, 15.5, 21.6, 25.3,
    13.2, 17.3, 23.3, 26.7, 13.6, 17.3, 23.8, 26.3)
)</pre></div></div>

<p><strong>Base Graphics</strong></p>
<p><!--[Fast Tube]--><span id="5izUzQKL1yw" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-dot-plots/#5izUzQKL1yw"><img src="http://i.ytimg.com/vi/5izUzQKL1yw/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>In the <strong>base</strong> graphics system we build up the <strong>dotplot</strong> with a series of commands. The first function call creates the graph region based on the data set but we do not plot any data by setting the <strong>type = &#8220;n&#8221;</strong> argument. The axis labels for the horizontal and vertical scales are set along with the title in the initial function call:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">plot(irrigation.df$Area, irrigation.df$Region, xlab = &quot;Area&quot;,
  ylab = &quot;Region&quot;, main = &quot;Irrigation Area by Region&quot;, type = &quot;n&quot;)</pre></div></div>

<p>To add the points with separate colours for each of the four years we use the <strong>points</strong> function and subset to the particular year by testing a condition on the year. The <strong>col</strong> argument is used with a text string to specify the colour for the symbols for the given year:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">points(irrigation.df$Area[irrigation.df$Year == 1980],
  irrigation.df$Region[irrigation.df$Year == 1980], col = &quot;black&quot;, pch = 16)
points(irrigation.df$Area[irrigation.df$Year == 1990],
  irrigation.df$Region[irrigation.df$Year == 1990], col = &quot;blue&quot;, pch = 16)
points(irrigation.df$Area[irrigation.df$Year == 2000],
  irrigation.df$Region[irrigation.df$Year == 2000], col = &quot;red&quot;, pch = 16)
points(irrigation.df$Area[irrigation.df$Year == 2007],
  irrigation.df$Region[irrigation.df$Year == 2007], col = &quot;green&quot;, pch = 16)</pre></div></div>

<p>The code is rather long winded compared to the using the other two graphics packages. We can add a legend to the graph so that the years can be identified:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">legend(10, 4, legend = c(&quot;1980&quot;, &quot;1990&quot;, &quot;2000&quot;, &quot;2007&quot;),
  col = c(&quot;black&quot;, &quot;blue&quot;, &quot;red&quot;, &quot;green&quot;), pch = 16)</pre></div></div>

<p>The placement of the legend uses the <strong>x</strong> and <strong>y</strong> coordinates within the graph to position the box. All the code above produces the following graph:</p>
<div id="attachment_856" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/03/dotplot-base.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/03/dotplot-base-300x300.jpg" alt="Base Graphics Dot Plot" title="Dot Plot Example" width="300" height="300" class="size-medium wp-image-856" /></a><p class="wp-caption-text">Base Graphics Dot Plot</p></div>
<p>The graph is basic but we can consider the changes over time for the four regions. One downside is that the regions have been labelled with numbers rather than text strings.</p>
<p><strong>Lattice Graphics</strong></p>
<p><!--[Fast Tube]--><span id="-FGU6PMaSRY" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-dot-plots/#-FGU6PMaSRY"><img src="http://i.ytimg.com/vi/-FGU6PMaSRY/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The <strong>lattice</strong> graphics package has a function <strong>dotplot</strong> that is used to create dot plots. The first argument to the function is a formula describing the variables to use for the horizontal and vertical axes. We also specify the data frame to use for the graph and which column to determine different symbols and/or colours to highlight groupings within the plot:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">dotplot(Region ~ Area, data = irrigation.df, groups = Year,
  main = &quot;Irrigation Area by Region&quot;)</pre></div></div>

<p>The lattice variant of the graph is shown here:</p>
<div id="attachment_857" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/03/dotplot-lattice.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/03/dotplot-lattice-300x300.jpg" alt="Lattice Graphics Dot Plot" title="Dot Plot Example" width="300" height="300" class="size-medium wp-image-857" /></a><p class="wp-caption-text">Lattice Graphics Dot Plot</p></div>
<p>The graph is simple and very similar to the one produced using the base graphics with the advantage that the R code is not as complicated.</p>
<p><strong>ggplot2</strong></p>
<p><!--[Fast Tube]--><span id="y1CsT-jAWZQ" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-dot-plots/#y1CsT-jAWZQ"><img src="http://i.ytimg.com/vi/y1CsT-jAWZQ/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The <strong>ggplot</strong> function is used to create the dot plot where we first specify the name of the data frame with the information to be displayed and then use the <strong>aes</strong> argument to list the variables to plot on the horizontal and vertical axes. The colour argument determines the variable to use for assigning colours to (usually) a categorical variable.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(irrigation.df, aes(x = Area, y = Region, colour = Year)) +
  geom_point() + opts(title = &quot;Irrigation Area by Region&quot;)</pre></div></div>

<p>The <strong>ggplot2</strong> version of the dot plot is shown below:</p>
<div id="attachment_858" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/03/dotplot-ggplot2.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/03/dotplot-ggplot2-300x300.jpg" alt="ggplot2 Dot Plot" title="Dot Plot Example" width="300" height="300" class="size-medium wp-image-858" /></a><p class="wp-caption-text">ggplot2 Dot Plot</p></div>
<p>This graph is very similar to the ones produced using the other graphics packages but has the distinctive background and legend style that is used as the default option in <strong>ggplot2</strong>.</p>
<p>This blog post is summarised in a pdf leaflet on the <a href="http://www.wekaleamstudios.co.uk/?page_id=282">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-dot-plots/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Summarising data using bar charts</title>
		<link>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-bar-charts/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-bar-charts/#comments</comments>
		<pubDate>Sat, 12 Dec 2009 08:52:33 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Base Graphics]]></category>
		<category><![CDATA[Exploratory Data Analysis]]></category>
		<category><![CDATA[Grammar of Graphics]]></category>
		<category><![CDATA[Lattice Graphics]]></category>
		<category><![CDATA[Trellis Graphics]]></category>
		<category><![CDATA[bar chart]]></category>
		<category><![CDATA[barchart]]></category>
		<category><![CDATA[barplot]]></category>
		<category><![CDATA[FAO]]></category>
		<category><![CDATA[geom_bar]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[lattice]]></category>
		<category><![CDATA[trellis]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=664</guid>
		<description><![CDATA[A bar graph is a frequently used type of display that compares counts, frequencies, totals or other summary measures for a series of categories, e.g. sales in different market sectors or in quarters in a financial year. The bar graph can be laid out with the categories either on the vertical or horizontal axis of [...]]]></description>
			<content:encoded><![CDATA[<p>A bar graph is a frequently used type of display that compares counts, frequencies, totals or other summary measures for a series of categories, e.g. sales in different market sectors or in quarters in a financial year. The bar graph can be laid out with the categories either on the vertical or horizontal axis of the display &#8211; depending on whether we consider making a vertical or horizontal comparison is easier for interpreting the graph.<span id="more-664"></span></p>
<p>In <strong>R</strong> there are multiple ways for creating graphs, including the base graphics, lattice graphics and the ggplot2 grammar of graphics approach. To illustrate how we can create a bar chart using these packages we will make use of some data taken from the <a href="http://faostat.fao.org">FAO</a> statistics website for the UK in 2007. The data is for production (in metric tonnes) of the top five, in terms of production, food and agricultural commodities.</p>
<p>The first step before creating the graphs is to prepare the data in a format that can be used by the graphing functions. As this dataset is small we can manually create the data object. To make the labels on the graph less cluttered the production is recorded as 1,000s of metric tonnes.</p>
<p>The <strong>R</strong> code to create the data object is shown here:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">uk2007 = data.frame(Commodity =
  factor(c(&quot;Cow milk&quot;, &quot;Wheat&quot;, &quot;Sugar beet&quot;, &quot;Potatoes&quot;, &quot;Barley&quot;),
    levels = c(&quot;Cow milk&quot;, &quot;Wheat&quot;, &quot;Sugar beet&quot;, &quot;Potatoes&quot;, &quot;Barley&quot;)),
  Production = c(14023, 13221, 6500, 5635, 5079))</pre></div></div>

<p>The <strong>levels</strong> argument is explicity defined to make sure that the ordering is as required from largest to smallest production rather than being alphabetical which would be how the categories are ordered otherwise.</p>
<p><strong>Base Graphics</strong></p>
<p><!--[Fast Tube]--><span id="fVhdPbntKdw" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-bar-charts/#fVhdPbntKdw"><img src="http://i.ytimg.com/vi/fVhdPbntKdw/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The <strong>base</strong> graphics in R provide a function <strong>barplot</strong> that we can use to create a bar chart. The first argument to the function is the name of the object with the data. The <strong>names</strong> argument is used to provide the labels for the categories in the graph. We also specify the text for the labels for the x-axis, y-axis and title of the graph with the <strong>xlab</strong>, <strong>ylab</strong> and <strong>main</strong> arguments respectively.</p>
<p>The function call is:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">barplot(uk2007\$Production, names = uk2007\$Commodity,
  xlab = &quot;Commodity&quot;, ylab = &quot;Production (1,000 MT)&quot;,
  main = &quot;UK 2007 Top 5 Food and Agricultural Commodities&quot;)</pre></div></div>

<p>to produce the following graph:</p>
<div id="attachment_685" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2009/12/barchart-base1.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2009/12/barchart-base1-300x299.jpg" alt="Base Graphics Bar Chart" title="Barchart Example" width="300" height="299" class="size-medium wp-image-685" /></a><p class="wp-caption-text">Base Graphics Bar Chart</p></div>
<p>This graph is visually appealing with sensible space between the bars for the five commodity categories.</p>
<p><strong>Lattice Graphics</strong></p>
<p><!--[Fast Tube]--><span id="KvQOjlkseBA" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-bar-charts/#KvQOjlkseBA"><img src="http://i.ytimg.com/vi/KvQOjlkseBA/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>In the <strong>lattice</strong> graphics package the <strong>barchart</strong> function is used to create bar charts. The <strong>x</strong> and <strong>y</strong> variables are specified using a formula, which is the standard way when using Trellis graphics. The variable on the vertical axis is specified on the left hand side of the formula and the variable for the horizontal axis is on the right hand side, where they are separated by the tilda character.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">barchart(Production $\sim$ Commodity, data = uk2007, xlab = &quot;Commodity&quot;,
  ylab = &quot;Production (1,000 MT)&quot;,
  main = &quot;UK 2007 Top 5 Food and Agricultural Commodities&quot;)</pre></div></div>

<p>This code produces the following graph:</p>
<div id="attachment_686" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2009/12/barchart-lattice1.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2009/12/barchart-lattice1-300x299.jpg" alt="Lattice Graphics Bar Chart" title="Barchart Example" width="300" height="299" class="size-medium wp-image-686" /></a><p class="wp-caption-text">Lattice Graphics Bar Chart</p></div>
<p>The main visual difference compared to the base graphics example is the default colours for the bars which is much brighter than the base graphics example. There is also a large gap between the bars in the display.</p>
<p><strong>ggplot2</strong></p>
<p><!--[Fast Tube]--><span id="4jSfbKFdrTo" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/summarising-data-using-bar-charts/#4jSfbKFdrTo"><img src="http://i.ytimg.com/vi/4jSfbKFdrTo/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The create the bar chart in the <strong>ggplot2</strong> package we use the <strong>ggplot</strong> function to specify the data to appear in the graph then gradually add in the other components of the graph. </p>
<p>We specify the data frame where the data is stored and then use the <strong>aes</strong> argument to identify the <strong>x</strong> and <strong>y</strong> variables. The <strong>geom\_bar</strong> function is used to create a bar chart display with the specified data and the last three options in the example are for creating the various labels to be added to the graph.</p>
<p>The graph itself is constructed piece by piece to add the various layers and components on top of the base layer:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(uk2007, aes(Commodity, Production)) + geom_bar() + xlab(&quot;Commodity&quot;) +
  ylab(&quot;Production (1,000 MT)&quot;) +
  opts(title = &quot;UK 2007 Top 5 Food and Agricultural Commodities&quot;)</pre></div></div>

<p>This code produces the following graph:</p>
<div id="attachment_691" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2009/12/barchart-ggplot2.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2009/12/barchart-ggplot2-300x299.jpg" alt="ggplot2 Bar Chart" title="Barchart Example" width="300" height="299" class="size-medium wp-image-691" /></a><p class="wp-caption-text">ggplot2 Bar Chart</p></div>
<p>The layout of this graph differs mainly with the grid background layout, which by default is a gray with white lines.</p>
<p>This blog post is summarised in a pdf leaflet on the <a href="http://www.wekaleamstudios.co.uk/?page_id=282">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/summarising-data-using-bar-charts/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A Gallery of Graphs produced using R</title>
		<link>http://www.wekaleamstudios.co.uk/posts/a-gallery-of-graphs-produced-using-r/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/a-gallery-of-graphs-produced-using-r/#comments</comments>
		<pubDate>Sat, 12 Sep 2009 11:22:19 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Base Graphics]]></category>
		<category><![CDATA[Lattice Graphics]]></category>
		<category><![CDATA[Trellis Graphics]]></category>
		<category><![CDATA[Websites]]></category>
		<category><![CDATA[base]]></category>
		<category><![CDATA[graphs]]></category>
		<category><![CDATA[lattice]]></category>
		<category><![CDATA[trellis]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=436</guid>
		<description><![CDATA[The R environment for Statistical Analysis has strong facilities for producing high quality graphical output. There are many books and electronic documents that demonstrate examples of effective ways to display data graphically. To get an feel for the range of display that could be used it is worthwhile visiting the R Graph Gallery. The examples [...]]]></description>
			<content:encoded><![CDATA[<p>The <strong>R</strong> environment for Statistical Analysis has strong facilities for producing high quality graphical output. There are many books and electronic documents that demonstrate examples of effective ways to display data graphically. To get an feel for the range of display that could be used it is worthwhile visiting the <a href="http://addictedtor.free.fr/graphiques/">R Graph Gallery</a>.<span id="more-436"></span></p>
<p>The examples on this website use both the base and <strong>lattice</strong> graphics functions.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/a-gallery-of-graphs-produced-using-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using Histograms to Summarise Data</title>
		<link>http://www.wekaleamstudios.co.uk/posts/using-histograms-to-summarise-data/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/using-histograms-to-summarise-data/#comments</comments>
		<pubDate>Mon, 08 Jun 2009 20:44:22 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Base Graphics]]></category>
		<category><![CDATA[Data Summary]]></category>
		<category><![CDATA[Exploratory Data Analysis]]></category>
		<category><![CDATA[Lattice Graphics]]></category>
		<category><![CDATA[hist]]></category>
		<category><![CDATA[histogram]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=218</guid>
		<description><![CDATA[It is not only possible to use tabular displays to summarise a data set and we will often be interested in using a graphical display as this might be a more effective way to visualise our data rather than using statistics such as the mean or standard deviation. The histogram is a commonly used graphical [...]]]></description>
			<content:encoded><![CDATA[<p>It is not only possible to use tabular displays to summarise a data set and we will often be interested in using a graphical display as this might be a more effective way to visualise our data rather than using statistics such as the mean or standard deviation.<span id="more-218"></span></p>
<p>The histogram is a commonly used graphical display used to summarised univariate data and it provides a visual indication of the location and variation in the data. Histograms are constructed by dividing the data into ranges and count the number of data points that occur in each range and the height of the bar is based on this information.</p>
<p>We can create a histogram using either the <strong>base</strong> graphics or <strong>lattice</strong> graphics in <strong>R</strong>. The function <strong>hist</strong> is part of the <strong>base</strong> graphics and the first argument we specify in the function call is the actual data to be used in the histogram. An example of creating a histogram would use the following code:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">hist(olive.df$palmitic, xlab = &quot;Palmitic&quot;, main = &quot;Histogram&quot;)</pre></div></div>

<p>In this example we have also specified a label for the x-axis as well as the main title. The resulting graph looks like this:<br />
<div id="attachment_225" class="wp-caption aligncenter" style="width: 310px"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2009/05/histogram1-300x300.png" alt="Demonstration of using a histogram to summarise data" title="Histogram Example" width="300" height="300" class="size-medium wp-image-225" /><p class="wp-caption-text">Demonstration of using a histogram to summarise data</p></div></p>
<p>We can make use of the <strong>histogram</strong> function in the <strong>lattice</strong> library to create this plot and the syntax that we use is slightly different.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">histogram( ~ palmitic, data = olive.df)</pre></div></div>

<p>The first argument is a model formula that specifies that data to be used for the histogram as the independent variable component of the formula and the data argument is used to specify a data frame in which the function will look for the data. The histogram looks slightly different using this library:<br />
<div id="attachment_229" class="wp-caption aligncenter" style="width: 310px"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2009/05/histogram2-300x300.png" alt="Demonstration of using a histogram to summarise data" title="Lattice Histogram Example" width="300" height="300" class="size-medium wp-image-229" /><p class="wp-caption-text">Demonstration of using a histogram to summarise data</p></div></p>
<p>There are other types of graph that can be used to summarise univariate data which include the bow and whisker plot, density plot, strip plot or dot plot. These will be covered in subsequent posts either using the <strong>base</strong> graphics system or <strong>lattice</strong> graphics.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/using-histograms-to-summarise-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Plotting Probability Distributions</title>
		<link>http://www.wekaleamstudios.co.uk/posts/plotting-probability-distributions/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/plotting-probability-distributions/#comments</comments>
		<pubDate>Tue, 02 Jun 2009 19:53:26 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Base Graphics]]></category>
		<category><![CDATA[Probability Distributions]]></category>
		<category><![CDATA[abline]]></category>
		<category><![CDATA[expression]]></category>
		<category><![CDATA[main]]></category>
		<category><![CDATA[Mathematical Labels]]></category>
		<category><![CDATA[mean]]></category>
		<category><![CDATA[Normal]]></category>
		<category><![CDATA[plot]]></category>
		<category><![CDATA[Probability Distribution]]></category>
		<category><![CDATA[Standard Normal]]></category>
		<category><![CDATA[variance]]></category>
		<category><![CDATA[xlab]]></category>
		<category><![CDATA[ylab]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=196</guid>
		<description><![CDATA[There are many distributions that are available within the base R Statistical System and it is possibly to use these functions to visualise the density or cumulative density functions for a distribution with a given set of parameters. To illustrate this we could the standard normal distribution which has zero mean and variance of one [...]]]></description>
			<content:encoded><![CDATA[<p>There are many distributions that are available within the base R Statistical System and it is possibly to use these functions to visualise the density or cumulative density functions for a distribution with a given set of parameters.<span id="more-196"></span></p>
<p>To illustrate this we could the standard normal distribution which has zero mean and variance of one and the cumulative density function has the familiar S-shape. To plot the distribution on a graph we first create a variable to store the values for the distribution, which we set to be a sequence ranging from -4 to +4 and save the data to a variable <strong>tempX</strong> so that it can be used in the <strong>plot</strong> function:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tempX = seq(-4, 4, 0.1)</pre></div></div>

<p>The next step is to call the plot function and we provide a list of X and Y values that we want to plot against each other. In this case we have already defined the X values so we use the <strong>pnorm</strong> function to calculate the cumulative values at each of the X values that we have specified. We also set the text for the title and the two axis using the arguments <strong>main</strong>, <strong>xlab</strong> and <strong>ylab</strong>. We use the <strong>expression</strong> function to create a text string with Mathematical characters in it. The <strong>mu</strong> and <strong>sigma</strong> are converted to the corresponding greek letters. Lastly the option <strong>type = &#8220;l&#8221;</strong> is used to get the <strong>plot</strong> function to draw lines rather than symbols. Our final function call is:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">plot(tempX, pnorm(tempX, mean=0, sd=1), xlab=&quot;X Values&quot;,
  ylab=&quot;Cumulative Probability&quot;, 
  main = expression(paste(&quot;Normal Distribution: &quot;, mu, &quot; = 0, &quot;,
    sigma, &quot; = 1&quot;)), type=&quot;l&quot;)</pre></div></div>

<p>We add a horizontal grey line at the bottom of the graph using the <strong>abline</strong> function:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">abline(h=0, col=&quot;gray&quot;)</pre></div></div>

<p>The graph that is produced looks like this:<br />
<div id="attachment_205" class="wp-caption aligncenter" style="width: 310px"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2009/05/normal-distribution-300x300.png" alt="Plot of the Cumulative Standard Normal Distribution" title="Cumulative Normal Distribution" width="300" height="300" class="size-medium wp-image-205" /><p class="wp-caption-text">Plot of the Cumulative Standard Normal Distribution</p></div></p>
<p>We can use this approach to visualise the density or cumulative density functions of any distribution that is available in <strong>R</strong>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/plotting-probability-distributions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

