<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Software for Exploratory Data Analysis and Statistical Modelling &#187; Statistical Analysis</title>
	<atom:link href="http://www.wekaleamstudios.co.uk/topics/statistical-analysis/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.wekaleamstudios.co.uk</link>
	<description>Statistical Modelling with R</description>
	<lastBuildDate>Wed, 01 Feb 2012 19:44:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Generalized Linear Models &#8211; Poisson Regression</title>
		<link>http://www.wekaleamstudios.co.uk/posts/generalized-linear-models-poisson-regression/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/generalized-linear-models-poisson-regression/#comments</comments>
		<pubDate>Sun, 26 Jun 2011 09:28:50 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Grammar of Graphics]]></category>
		<category><![CDATA[Linear Models]]></category>
		<category><![CDATA[Statistical Modelling]]></category>
		<category><![CDATA[Generalized Linear Model]]></category>
		<category><![CDATA[glm]]></category>
		<category><![CDATA[Poisson]]></category>
		<category><![CDATA[update]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1547</guid>
		<description><![CDATA[The Generalized Linear Model (GLM) allows us to model responses with distributions other than the Normal distribution, which is one of the assumptions underlying linear regression as used in many cases. When data is counts of events (or items) then a discrete distribution is more appropriate is usually more appropriate than approximating with a continuous [...]]]></description>
			<content:encoded><![CDATA[<p>The Generalized Linear Model (GLM) allows us to model responses with distributions other than the Normal distribution, which is one of the assumptions underlying linear regression as used in many cases. When data is counts of events (or items) then a discrete distribution is more appropriate is usually more appropriate than approximating with a continuous distribution, especially as our counts should be bounded below at zero. Negative counts do not make sense.<span id="more-1547"></span></p>
<p><!--[Fast Tube]--><span id="Z1qE9-Vqw50" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/generalized-linear-models-poisson-regression/#Z1qE9-Vqw50"><img src="http://i.ytimg.com/vi/Z1qE9-Vqw50/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>To investigate using Poisson regression via the GLM framework consider a small data set on failure modes (<a href="http://www.sci.usq.edu.au/staff/dunn/Datasets/tech-glms.html">here</a>).</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; failure.df = read.table(&quot;twomodes.dat&quot;, header = TRUE)
&gt; failure.df
  Mode1 Mode2 Failures
1  33.3  25.3       15
2  52.2  14.4        9
3  64.7  32.5       14
4 137.0  20.5       24
5 125.9  97.6       27
6 116.3  53.6       27
7 131.7  56.6       23
8  85.0  87.3       18
9  91.9  47.8       22</pre></div></div>

<p>The machinery is run in two modes and the objective of the analysis is to determine whether the number of failures depends on how long the machine is run in mode 1 or mode 2 and whether there is an interaction between the time in each mode to increases or decreases the number of failures.</p>
<p>The response for this set of data is the number of failures (count) so a Poisson regression model is considered.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; fmod1 = glm(Failures ~ Mode1 * Mode2, data = failure.df, family = poisson)
&gt; summary(fmod1)
&nbsp;
Call:
glm(formula = Failures ~ Mode1 * Mode2, family = poisson, data = failure.df)
&nbsp;
Deviance Residuals: 
       1         2         3         4         5         6         7         8         9  
 0.91003  -1.15601  -0.28328  -0.10398   0.03526   0.84825  -0.49211  -0.57298   0.64821  
&nbsp;
Coefficients:
              Estimate Std. Error z value Pr(&gt;|z|)    
(Intercept)  2.105e+00  4.481e-01   4.698 2.63e-06 ***
Mode1        7.687e-03  4.285e-03   1.794   0.0729 .  
Mode2        4.703e-03  1.163e-02   0.405   0.6858    
Mode1:Mode2 -1.978e-05  1.037e-04  -0.191   0.8487    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
&nbsp;
(Dispersion parameter for poisson family taken to be 1)
&nbsp;
    Null deviance: 16.996  on 8  degrees of freedom
Residual deviance:  3.967  on 5  degrees of freedom
AIC: 55.024
&nbsp;
Number of Fisher Scoring iterations: 4</pre></div></div>

<p>The model output does not provide any support for an interaction between the number of time spent in the two different modes of operation. If we remove the interaction term and re-fit the model, using the update function, we get:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; fmod2 = update(fmod1, . ~ . - Mode1:Mode2)
&gt; summary(fmod2)
&nbsp;
Call:
glm(formula = Failures ~ Mode1 + Mode2, family = poisson, data = failure.df)
&nbsp;
Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.21984  -0.44735  -0.05893   0.68351   0.87510  
&nbsp;
Coefficients:
            Estimate Std. Error z value Pr(&gt;|z|)    
(Intercept) 2.175168   0.255456   8.515  &lt; 2e-16 ***
Mode1       0.007015   0.002429   2.888  0.00387 ** 
Mode2       0.002549   0.002835   0.899  0.36852    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
&nbsp;
(Dispersion parameter for poisson family taken to be 1)
&nbsp;
    Null deviance: 16.9964  on 8  degrees of freedom
Residual deviance:  4.0033  on 6  degrees of freedom
AIC: 53.06
&nbsp;
Number of Fisher Scoring iterations: 4</pre></div></div>

<p>This output suggests that the time of operation in mode 1 is important for determining the number of faults but the time of operation in mode 2 is not important. One last step gives us:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; fmod3 = update(fmod2, . ~ . - Mode2)
&gt; summary(fmod3)
&nbsp;
Call:
glm(formula = Failures ~ Mode1, family = poisson, data = failure.df)
&nbsp;
Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.43194  -0.56958  -0.00745   0.66742   0.82231  
&nbsp;
Coefficients:
            Estimate Std. Error z value Pr(&gt;|z|)    
(Intercept) 2.237196   0.243053   9.205  &lt; 2e-16 ***
Mode1       0.007705   0.002264   3.403 0.000667 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
&nbsp;
(Dispersion parameter for poisson family taken to be 1)
&nbsp;
    Null deviance: 16.9964  on 8  degrees of freedom
Residual deviance:  4.8078  on 7  degrees of freedom
AIC: 51.865
&nbsp;
Number of Fisher Scoring iterations: 4</pre></div></div>

<p>The diagnostic plots are shown below which do not indicate any major problems with the final model, especially given the small number of data points.</p>
<div id="attachment_1644" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2011/06/Poisson-Regression.png"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2011/06/Poisson-Regression-300x300.png" alt="Residual Plots for Poisson Regression model" title="Residual Plots for Poisson Regression model" width="300" height="300" class="size-medium wp-image-1644" /></a><p class="wp-caption-text">Four diagnostic plots for a Poisson regression model based on total failures</p></div>
<p>Other useful resources are provided on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/generalized-linear-models-poisson-regression/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Fractional Factorial Designs using FrF2</title>
		<link>http://www.wekaleamstudios.co.uk/posts/fractional-factorial-designs-using-frf2/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/fractional-factorial-designs-using-frf2/#comments</comments>
		<pubDate>Wed, 18 May 2011 18:17:18 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Design of Experiments]]></category>
		<category><![CDATA[fractional factorial]]></category>
		<category><![CDATA[FrF2]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1625</guid>
		<description><![CDATA[The FrF2 package for R can be used to create regular and non-regular Fractional Factorial 2-level designs. It is reasonably straightforward to use. First step is to install the package then make it available for use in the current session: require(FrF2) A basic call to the main functino FrF2 specifies the number of runs in [...]]]></description>
			<content:encoded><![CDATA[<p>The <strong>FrF2</strong> package for <strong>R</strong> can be used to <em>create regular and non-regular Fractional Factorial 2-level designs</em>. It is reasonably straightforward to use.<span id="more-1625"></span></p>
<p>First step is to install the package then make it available for use in the current session:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">require(FrF2)</pre></div></div>

<p>A basic call to the main functino <strong>FrF2</strong> specifies the number of runs in the fractional factorial design (which needs to be a multiple of 2) and the number of factors. For example a three factor design would have a total of eight runs if it was a full factorial but if we wanted to go with four runs then we can generate the design like this:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; FrF2(4, 3)
   A  B  C
1  1 -1 -1
2 -1  1 -1
3 -1 -1  1
4  1  1  1
class=design, type= FrF2</pre></div></div>

<p>The default output labels the factors A, B, C and so on and the factor levels are -1 and +1 for the two levels of each factor. We can change the level names to low and high using the <strong>default.levels</strong> function argument:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; FrF2(4, 3, default.levels = c(&quot;low&quot;, &quot;high&quot;))
     A    B    C
1 high high high
2  low high  low
3 high  low  low
4  low  low high
class=design, type= FrF2</pre></div></div>

<p>The factors can be specified as a list of names rather than the number of factors via the <strong>factor.names</strong> argument:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; FrF2(4, factor.names = c(&quot;One&quot;, &quot;Two&quot;, &quot;Three&quot;),
  default.levels = c(&quot;low&quot;, &quot;high&quot;))
   One  Two Three
1  low high   low
2 high high  high
3  low  low  high
4 high  low   low
class=design, type= FrF2</pre></div></div>

<p>These are the basics and there are other features for greater control over the confounding between factors and their interactions that is introduced by using a fractional factorial design.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/fractional-factorial-designs-using-frf2/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Data Mining with WEKA</title>
		<link>http://www.wekaleamstudios.co.uk/posts/data-mining-with-weka/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/data-mining-with-weka/#comments</comments>
		<pubDate>Sun, 30 Jan 2011 17:53:31 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Statistical Analysis]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[WEKA]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1583</guid>
		<description><![CDATA[There are a number of good open source projects for statistics and data mining, for example the software WEKA developed at the University of Waikato. The description on their website states that: Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or [...]]]></description>
			<content:encoded><![CDATA[<p>There are a number of good open source projects for statistics and data mining, for example the software <a href="http://www.cs.waikato.ac.nz/ml/weka/">WEKA</a> developed at the University of Waikato.<span id="more-1583"></span></p>
<p>The description on their website states that:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">Weka is a collection of machine learning algorithms for data mining tasks.
The algorithms can either be applied directly to a dataset or called from
your own Java code. Weka contains tools for data pre-processing,
classification, regression, clustering, association rules, and visualization.
It is also well-suited for developing new machine learning schemes.</pre></div></div>

<p>The software is written in Java and available under the GNU General Public Licence. The website also provides access to data sets from the <a href="http://archive.ics.uci.edu/ml/">UCI Machine Learning</a> website for use with WEKA.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/data-mining-with-weka/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Gapminder</title>
		<link>http://www.wekaleamstudios.co.uk/posts/gapminder/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/gapminder/#comments</comments>
		<pubDate>Thu, 06 Jan 2011 18:53:41 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Statistical Analysis]]></category>
		<category><![CDATA[gapminder]]></category>
		<category><![CDATA[Hans]]></category>
		<category><![CDATA[Rosling]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1579</guid>
		<description><![CDATA[As many people are aware Hans Rosling is an enthusiastic swedish academic with a passion for statistics who recently presented the program The Joy of Stats. One of the great things about Hans Rosling is his presentations and the interactive graphics that he uses to make his points. Fast Tube by Casper The gapminder software [...]]]></description>
			<content:encoded><![CDATA[<p>As many people are aware Hans Rosling is an enthusiastic swedish academic with a passion for statistics who recently presented the program The Joy of Stats. One of the great things about Hans Rosling is his presentations and the interactive graphics that he uses to make his points.<span id="more-1579"></span></p>
<p><!--[Fast Tube]--><span id="hVimVzgtD6w" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/gapminder/#hVimVzgtD6w"><img src="http://i.ytimg.com/vi/hVimVzgtD6w/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The <a href="http://www.gapminder.org/">gapminder</a> software used in his presentations is available for experimentation and it is interesting to have a play with it as it can help with thinking about ways to effectively present data. The website also has information about the data sources.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/gapminder/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Plotting Time Series data using ggplot2</title>
		<link>http://www.wekaleamstudios.co.uk/posts/plotting-time-series-data-using-ggplot2/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/plotting-time-series-data-using-ggplot2/#comments</comments>
		<pubDate>Thu, 30 Sep 2010 21:05:18 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Exploratory Data Analysis]]></category>
		<category><![CDATA[Grammar of Graphics]]></category>
		<category><![CDATA[aes]]></category>
		<category><![CDATA[date]]></category>
		<category><![CDATA[geom_line]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[line]]></category>
		<category><![CDATA[plot]]></category>
		<category><![CDATA[scale_x_date]]></category>
		<category><![CDATA[time series]]></category>
		<category><![CDATA[xlab]]></category>
		<category><![CDATA[ylab]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1404</guid>
		<description><![CDATA[There are various ways to plot data that is represented by a time series in R. The ggplot2 package has scales that can handle dates reasonably easily. Fast Tube by Casper As an example consider a data set on the number of views of the you tube channel ramstatvid. A short snippet of the data [...]]]></description>
			<content:encoded><![CDATA[<p>There are various ways to plot data that is represented by a time series in <strong>R</strong>. The <strong><a href="http://had.co.nz/ggplot2/">ggplot2</a></strong> package has scales that can handle dates reasonably easily.<span id="more-1404"></span></p>
<p><!--[Fast Tube]--><span id="irtSRkhGbXg" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/plotting-time-series-data-using-ggplot2/#irtSRkhGbXg"><img src="http://i.ytimg.com/vi/irtSRkhGbXg/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>As an example consider a data set on the number of views of the you tube channel <a href="http://www.youtube.com/user/ramstatvid?feature=mhum">ramstatvid</a>. A short snippet of the data is shown here:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; head(yt.views)
        Date Views
1 2010-05-17    13
2 2010-05-18    11
3 2010-05-19     4
4 2010-05-20     2
5 2010-05-21    23
6 2010-05-22    26</pre></div></div>

<p>The <strong>ggplot</strong> function is used by specifying a data frame and the <strong>aes</strong> maps the <strong>Date</strong> to the x-axis and the number of <strong>Views</strong> to the y-axis.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(yt.views, aes(Date, Views)) + geom_line() +
  scale_x_date(format = &quot;%b-%Y&quot;) + xlab(&quot;&quot;) + ylab(&quot;Daily Views&quot;)</pre></div></div>

<p>The axis labels for the <strong>Date</strong> variable are created with the <strong>scale_x_date</strong> function where the format is specified as a Month/Year combination with the <strong>%b</strong> and <strong>%Y</strong> formatting strings. The graph that is produced is shown here:</p>
<div id="attachment_1403" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/09/ts-example1.jpg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/09/ts-example1-300x300.jpg" alt="Time Series Example" title="Time Series Example" width="300" height="300" class="size-medium wp-image-1403" /></a><p class="wp-caption-text">Time Series Plot Example with ggplot2 package</p></div>
<p>Other useful resources are provided on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/plotting-time-series-data-using-ggplot2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Classification Trees using the rpart function</title>
		<link>http://www.wekaleamstudios.co.uk/posts/classification-trees-using-the-rpart-function/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/classification-trees-using-the-rpart-function/#comments</comments>
		<pubDate>Tue, 21 Sep 2010 19:22:50 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Statistical Modelling]]></category>
		<category><![CDATA[CART]]></category>
		<category><![CDATA[classification]]></category>
		<category><![CDATA[misclassification]]></category>
		<category><![CDATA[plotcp]]></category>
		<category><![CDATA[printcp]]></category>
		<category><![CDATA[prune]]></category>
		<category><![CDATA[rpart]]></category>
		<category><![CDATA[tree]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1399</guid>
		<description><![CDATA[In a previous post on classification trees we considered using the tree package to fit a classification tree to data divided into known classes. In this post we will look at the alternative function rpart that is available within the base R distribution. Fast Tube by Casper A classification tree can be fitted using the [...]]]></description>
			<content:encoded><![CDATA[<p>In a previous <a href="http://www.wekaleamstudios.co.uk/posts/classification-trees/">post</a> on classification trees we considered using the <strong>tree</strong> package to fit a classification tree to data divided into known classes. In this post we will look at the alternative function rpart that is available within the base <strong>R</strong> distribution.<span id="more-1399"></span></p>
<p><!--[Fast Tube]--><span id="m3mLNpeke0I" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/classification-trees-using-the-rpart-function/#m3mLNpeke0I"><img src="http://i.ytimg.com/vi/m3mLNpeke0I/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>A classification tree can be fitted using the <strong>rpart</strong> function using a similar syntax to the <strong>tree</strong> function. For the ecoli data set discussed in the previous <a href="http://www.wekaleamstudios.co.uk/posts/classification-trees/">post</a> we would use:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; require(rpart)
&gt; ecoli.df = read.csv(&quot;ecoli.txt&quot;)</pre></div></div>

<p>followed by</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; ecoli.rpart1 = rpart(class ~ mcv + gvh + lip + chg + aac + alm1 + alm2, 
  data = ecoli.df)</pre></div></div>

<p>We would then consider whether the tree could be simplified by pruning and make use of the <strong>plotcp</strong> function:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; plotcp(ecoli.rpart1)</pre></div></div>

<p>Once the amount of pruning has been determined from this graph or by looking at the output from the <strong>printcp</strong> function:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; printcp(ecoli.rpart1)
&nbsp;
Classification tree:
rpart(formula = class ~ mcv + gvh + lip + chg + aac + alm1 + 
    alm2, data = ecoli.df)
&nbsp;
Variables actually used in tree construction:
[1] aac  alm1 gvh  mcv 
&nbsp;
Root node error: 193/336 = 0.5744
&nbsp;
n= 336 
&nbsp;
        CP nsplit rel error  xerror     xstd
1 0.388601      0   1.00000 1.00000 0.046959
2 0.207254      1   0.61140 0.61658 0.045423
3 0.062176      2   0.40415 0.45596 0.041758
4 0.051813      3   0.34197 0.38342 0.039359
5 0.031088      4   0.29016 0.36269 0.038571
6 0.015544      5   0.25907 0.30570 0.036136
7 0.010000      6   0.24352 0.31088 0.036375</pre></div></div>

<p>The <strong>prune</strong> function is used to simplify the tree based on a <em>cp</em> identified from the graph or printed output threshold.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; ecoli.rpart2 = prune(ecoli.rpart1, cp = 0.02)</pre></div></div>

<p>The classification tree can be visualised with the plot function and then the text function adds labels to the graph:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; plot(ecoli.rpart2, uniform = TRUE)
&gt; text(ecoli.rpart2, use.n = TRUE, cex = 0.75)</pre></div></div>

<p>Other useful resources are provided on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/classification-trees-using-the-rpart-function/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Classification Trees</title>
		<link>http://www.wekaleamstudios.co.uk/posts/classification-trees/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/classification-trees/#comments</comments>
		<pubDate>Sat, 18 Sep 2010 09:23:21 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Statistical Modelling]]></category>
		<category><![CDATA[CART]]></category>
		<category><![CDATA[classification]]></category>
		<category><![CDATA[cross-validation]]></category>
		<category><![CDATA[misclassification]]></category>
		<category><![CDATA[rpart]]></category>
		<category><![CDATA[tree]]></category>
		<category><![CDATA[xtabs]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1387</guid>
		<description><![CDATA[Decision trees are applied to situation where data is divided into groups rather than investigating a numerical response and its relationship to a set of descriptor variables. There are various implementations of classification trees in R and the some commonly used functions are rpart and tree. Fast Tube by Casper To illustrate the use of [...]]]></description>
			<content:encoded><![CDATA[<p>Decision trees are applied to situation where data is divided into groups rather than investigating a numerical response and its relationship to a set of descriptor variables. There are various implementations of classification trees in <strong>R</strong> and the some commonly used functions are <strong>rpart</strong> and <strong>tree</strong>.<span id="more-1387"></span></p>
<p><!--[Fast Tube]--><span id="9XNhqO1bu0A" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/classification-trees/#9XNhqO1bu0A"><img src="http://i.ytimg.com/vi/9XNhqO1bu0A/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>To illustrate the use of the <strong>tree</strong> function we will use a set of data from the UCI <a href="http://archive.ics.uci.edu/ml/">Machine Learning Repository</a> where the objective of the study using this data was to <em>predict the cellular localization sites of proteins</em>.</p>
<p>The data provided on the website is shown here:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; ecoli.df = read.csv(&quot;ecoli.txt&quot;)
&gt; head(ecoli.df)
    Sequence  mcv  gvh  lip chg  aac alm1 alm2 class
1  AAT_ECOLI 0.49 0.29 0.48 0.5 0.56 0.24 0.35    cp
2 ACEA_ECOLI 0.07 0.40 0.48 0.5 0.54 0.35 0.44    cp
3 ACEK_ECOLI 0.56 0.40 0.48 0.5 0.49 0.37 0.46    cp
4 ACKA_ECOLI 0.59 0.49 0.48 0.5 0.52 0.45 0.36    cp
5  ADI_ECOLI 0.23 0.32 0.48 0.5 0.55 0.25 0.35    cp
6 ALKH_ECOLI 0.67 0.39 0.48 0.5 0.36 0.38 0.46    cp</pre></div></div>

<p>We can use the <strong>xtabs</strong> function to summarise the number of cases in each class.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; xtabs( ~ class, data = ecoli.df)
class
 cp  im imL imS imU  om omL  pp 
143  77   2   2  35  20   5  52</pre></div></div>

<p>As noted in the comments the package that I used was the <strong>tree</strong> package:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; require(tree)</pre></div></div>

<p>The complete classification tree using all variables is fitted to the data initially and then we will try to <em>prune</em> the tree to make it smaller.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; ecoli.tree1 = tree(class ~ mcv + gvh + lip + chg + aac + alm1 + alm2,
  data = ecoli.df)
&gt; summary(ecoli.tree1)
&nbsp;
Classification tree:
tree(formula = class ~ mcv + gvh + lip + chg + aac + alm1 + alm2, 
    data = ecoli.df)
Variables actually used in tree construction:
[1] &quot;alm1&quot; &quot;mcv&quot;  &quot;gvh&quot;  &quot;aac&quot;  &quot;alm2&quot;
Number of terminal nodes:  10 
Residual mean deviance:  0.7547 = 246 / 326 
Misclassification error rate: 0.122 = 41 / 336</pre></div></div>

<p>The <strong>tree</strong> function is used in a similar way to other modelling functions in <strong>R</strong>. The misclassification rate is shown as part of the summary of the tree. This tree can be plotted and annotated with these commands:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; plot(ecoli.tree1)
&gt; text(ecoli.tree1, all = T)</pre></div></div>

<p>To prune the tree we use cross-validation to identify the point to <em>prune</em>.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; cv.tree(ecoli.tree1)
$size
 [1] 10  9  8  7  6  5  4  3  2  1
&nbsp;
$dev
 [1]  463.6820  457.4463  447.9824  441.8617  455.8318  478.9234  533.5856  586.2820  713.2992 1040.3878
&nbsp;
$k
 [1]      -Inf  12.16500  15.60004  19.21572  34.29868  41.10627  50.57044  64.05494 180.78800 355.67747
&nbsp;
$method
[1] &quot;deviance&quot;
&nbsp;
attr(,&quot;class&quot;)
[1] &quot;prune&quot;         &quot;tree.sequence&quot;</pre></div></div>

<p>This suggests a tree size of 6 and we can re-fit the tree:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; ecoli.tree2 = prune.misclass(ecoli.tree1, best = 6)
&gt; summary(ecoli.tree2)
&nbsp;
Classification tree:
snip.tree(tree = ecoli.tree1, nodes = c(4, 20, 7))
Variables actually used in tree construction:
[1] &quot;alm1&quot; &quot;mcv&quot;  &quot;aac&quot;  &quot;gvh&quot; 
Number of terminal nodes:  6 
Residual mean deviance:  0.9918 = 327.3 / 330 
Misclassification error rate: 0.1548 = 52 / 336</pre></div></div>

<p>The misclassification rate has increased but not substantially with the <em>pruning</em> of the tree.</p>
<p>Other useful resources are provided on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
<p>Data used in this post: <a href='http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/09/ecoli.txt'>Ecoli Data Set</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/classification-trees/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Charting the performance of cricket all-rounders &#8211; IT Botham</title>
		<link>http://www.wekaleamstudios.co.uk/posts/charting-the-performance-of-cricket-all-rounders-it-botham/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/charting-the-performance-of-cricket-all-rounders-it-botham/#comments</comments>
		<pubDate>Mon, 16 Aug 2010 19:59:54 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Data Summary]]></category>
		<category><![CDATA[Exploratory Data Analysis]]></category>
		<category><![CDATA[Grammar of Graphics]]></category>
		<category><![CDATA[all-rounder]]></category>
		<category><![CDATA[botham]]></category>
		<category><![CDATA[catches]]></category>
		<category><![CDATA[cricket]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[histogram]]></category>
		<category><![CDATA[runs]]></category>
		<category><![CDATA[wickets]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1321</guid>
		<description><![CDATA[Cricket is a sport that generates a large volume of performance data and corresponding debate about the relative qualities of various players over their careers and in relation to their contemporaries. The cricinfo website has an extensive database of statistics for professional cricketers that can be searched to access the information in various formats. As [...]]]></description>
			<content:encoded><![CDATA[<p>Cricket is a sport that generates a large volume of performance data and corresponding debate about the relative qualities of various players over their careers and in relation to their contemporaries. The <a href="http://www.cricinfo.com/">cricinfo</a> website has an extensive database of statistics for professional cricketers that can be searched to access the information in various formats.<span id="more-1321"></span></p>
<p>As an initial example we will consider the English legend Sir Ian Botham who played 102 test matches for England between his debut in 1977 until his final game in 1992.</p>
<p>The first obvious breakdown is to consider how Botham performed against the six countries that he played against during his test career. A summary of his statistics are shown here:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;"> Opposition Matches Bat Inns Runs NO Bowl Inns Wicket Catch
  Australia      36       49 1673  2       66     148    57
      India      14       16 1201  0       23      59    14
New Zealand      15       22  846  2       28      64    14
   Pakistan      14       20  647  1       18      40    14
  Sri Lanka       3        3   41  0        6      11     2
West Indies      20       37  792  1       27      61    19</pre></div></div>

<p>Botham only played three matches against Sri Lanka so it is difficult to properly assess his performance against them. If the above table is stored in a data frame <strong>itb.opp</strong> then we can create a histogram of the total runs (or wickets) by opposition country:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(itb.opp, aes(Opposition, Runs)) + geom_bar() + xlab(&quot;Country&quot;) +
  ylab(&quot;Total Runs&quot;)</pre></div></div>

<p>This code produces the following graph:</p>
<div id="attachment_1355" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/08/ITB-Total-Runs.png"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/08/ITB-Total-Runs-300x300.png" alt="IT Botham Total Runs by Opposition" title="IT Botham Total Runs" width="300" height="300" class="size-medium wp-image-1355" /></a><p class="wp-caption-text">IT Botham Total Runs by Opposition</p></div>
<p>The total wickes graph is produced by the next code:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(itb.opp, aes(Opposition, Wicket)) + geom_bar() + xlab(&quot;Country&quot;) +
  ylab(&quot;Total Wickets&quot;)</pre></div></div>

<div id="attachment_1356" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/08/ITB-Total-Wickets.png"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/08/ITB-Total-Wickets-300x300.png" alt="IT Botham Total Wickets by Opposition" title="IT Botham Total Wickets" width="300" height="300" class="size-medium wp-image-1356" /></a><p class="wp-caption-text">IT Botham Total Wickets by Opposition</p></div>
<p>We may now want to delve deeper into the performance against different nations to take into account the number of games or innings where Botham batted or bowled. The traditional way to assess performance is to calculate batting and bowling averages and we can do this by opposition which provides the following data frame:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; itb.opp.sum
 Opposition Discipline  Average
  Australia    Batting 29.35088
      India    Batting 70.64706
New Zealand    Batting 42.30000
   Pakistan    Batting 32.35000
  Sri Lanka    Batting 13.66667
West Indies    Batting 21.40541
  Australia    Bowling 27.65541
      India    Bowling 26.40678
New Zealand    Bowling 23.43750
   Pakistan    Bowling 31.77500
  Sri Lanka    Bowling 28.18182
West Indies    Bowling 35.18033</pre></div></div>

<p>This can be converted into a dot plot so we can see whether Botham had a high batting average than bowling average, which is often taken to be one of the signs of an all-rounder.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(itb.opp.sum, aes(Average, Opposition, colour = Discipline)) +
  geom_point()+ xlab(&quot;Average&quot;) + ylab(&quot;&quot;)</pre></div></div>

<p>The graph is shown here:</p>
<div id="attachment_1362" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/08/ITB-Averages-Country.png"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/08/ITB-Averages-Country-300x300.png" alt="IT Botham Batting and Bowling Averages by Opposition" title="IT Botham Batting and Bowling Averages" width="300" height="300" class="size-medium wp-image-1362" /></a><p class="wp-caption-text">IT Botham Batting and Bowling Averages by Opposition</p></div>
<p>We can see the differences in performance based on the opposition. Botham&#8217;s performance against the West Indies, by far the strongest team during most of his international career, were worse than against the other countries. However, his averages were far from embarassing when compared to other players at the time. The graph also shows that Botham enjoyed batting and bowling against India.</p>
<p>We can divide this data further based on whether the matches were played in England or outside of England and this data is shown here:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; itb.opp.ha.sum
  Opposition Venue Discipline  Average
   Australia  Away    Batting 30.22581
       India  Away    Batting 61.55556
 New Zealand  Away    Batting 50.44444
    Pakistan  Away    Batting 16.00000
   Sri Lanka  Away    Batting 13.00000
 West Indies  Away    Batting 14.17647
   Australia  Home    Batting 28.30769
       India  Home    Batting 80.87500
 New Zealand  Home    Batting 35.63636
    Pakistan  Home    Batting 34.16667
   Sri Lanka  Home    Batting 14.00000
 West Indies  Home    Batting 27.55000
   Australia  Away    Bowling 28.44928
       India  Away    Bowling 25.53333
 New Zealand  Away    Bowling 27.44444
    Pakistan  Away    Bowling 45.00000
   Sri Lanka  Away    Bowling 21.66667
 West Indies  Away    Bowling 39.50000
   Australia  Home    Bowling 26.96203
       India  Home    Bowling 27.31034
 New Zealand  Home    Bowling 20.51351
    Pakistan  Home    Bowling 31.07895
   Sri Lanka  Home    Bowling 30.62500
 West Indies  Home    Bowling 31.97143</pre></div></div>

<p>A dot plot is created from this data with a separate panel for each of the six opposition countries and the averages divided into batting and bowling performances. The coloured dots in the graph indicated whether the average is for matches at home or away.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(itb.opp.ha.sum, aes(Average, Discipline, colour = Venue)) +
  geom_point() + facet_wrap( ~ Opposition) +
  xlab(&quot;Batting Average&quot;) + ylab(&quot;&quot;)</pre></div></div>

<p>This graph is shown below:</p>
<div id="attachment_1366" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/08/ITB-Averages-Country-HomeAway.png"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/08/ITB-Averages-Country-HomeAway-300x300.png" alt="IT Botham Batting and Bowling Averages by Country and Home/Away" title="IT Botham Batting and Bowling Averages" width="300" height="300" class="size-medium wp-image-1366" /></a><p class="wp-caption-text">IT Botham Batting and Bowling Averages by Country and Home/Away</p></div>
<p>We can see that the difference between home and away peformance is, in general, not very large for bowling averages but in some cases there is a noticeable difference in batting averages. When looking at Botham&#8217;s performances against the West Indies his statistics at home are much better than his away performance, suggesting that his main struggles against the strong West Indies team were in the Caribbean. This might be due to his swing bowling being more suitable to English conditions compared to pitches in the West Indies.</p>
<p>To round off this brief look at the career of IT Botham let us consider some other important statistics, in particular games where he performed with the bat and ball.</p>
<ul>
<li>Overall Botham scored 14 hundreds and 22 fifties out of 161 innings so he reached fifty runs every five innings or so.</li>
<li>He also took 27 five wicket hauls and 17 four wicket hauls so he took four or more wickets every four innings or so.</li>
<li>He took 120 catches.</li>
</ul>
<p>Individual matches of excellence include five games with a century and at least five wickets:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">Year  Opposition       Ground Venue Runs Wicket
1978 New Zealand Christchurch  Away  133      8
1978    Pakistan       Lord's  Home  108      8
1980       India       Mumbai  Away  114     13
1981   Australia        Leeds  Home  199      7
1984 New Zealand   Wellington  Away  138      6</pre></div></div>

<p>These performances and others show why Botham was considered such a great player as he produced some sustained periods of excellent all-round cricket rather than having one discipline more dominant for a long period of time.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/charting-the-performance-of-cricket-all-rounders-it-botham/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Generating Balanced Incomplete Block Designs (BIBD)</title>
		<link>http://www.wekaleamstudios.co.uk/posts/generating-balanced-incomplete-block-designs-bibd/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/generating-balanced-incomplete-block-designs-bibd/#comments</comments>
		<pubDate>Fri, 16 Jul 2010 12:04:50 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Design of Experiments]]></category>
		<category><![CDATA[Balanced Incomplete Block Design]]></category>
		<category><![CDATA[BIBD]]></category>
		<category><![CDATA[crossdes]]></category>
		<category><![CDATA[find.BIB]]></category>
		<category><![CDATA[isGYD]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1296</guid>
		<description><![CDATA[The Balanced Incomplete Block Design (BIBD) is a well studied experimental design that has various desirable features from a statistical perspective. The crossdes package in R provides a way to generate a block design for some given parameters and test wheter this design satisfies the BIBD conditions. For a BIBD there are v treatments repeated [...]]]></description>
			<content:encoded><![CDATA[<p>The Balanced Incomplete Block Design (BIBD) is a well studied experimental design that has various desirable features from a statistical perspective. The <strong>crossdes</strong> package in <strong>R</strong> provides a way to generate a block design for some given parameters and test wheter this design satisfies the BIBD conditions.<span id="more-1296"></span></p>
<p>For a BIBD there are <strong>v</strong> treatments repeated <strong>r</strong> times in <strong>b</strong> blocks of <strong>k</strong> observations. There is a fifth parameter <strong>lambda</strong> that records the number of blocks where every pair of treatment occurs in the design.</p>
<p>We first load the crossdes package in our sessions:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">require(crossdes)</pre></div></div>

<p>The function <strong>find.BIB</strong> is used to generate a block design with specific number of treatments, blocks (rows of the design) and elements per block (columns of the design).</p>
<p>Consider an example with five treatments in four blocks of three elements. We can create a block design via:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; find.BIB(5, 4, 3)
     [,1] [,2] [,3]
[1,]    1    3    4
[2,]    2    4    5
[3,]    2    3    5
[4,]    1    2    5</pre></div></div>

<p>This design is not a BIBD because the treatments are not all repeated the same number of times in the design and we can check this with the <strong>isGYD</strong> function. For this example:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; isGYD(find.BIB(5, 4, 3))
&nbsp;
[1] The design is neither balanced w.r.t. rows nor w.r.t. columns.</pre></div></div>

<p>This confirms what we can see from the design.</p>
<p>Let us instead consider a design with seven treatments and seven blocks of three elements to see whether we can create a BIBD with these parameters:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; my.design = find.BIB(7, 7, 3)
&gt; my.design
     [,1] [,2] [,3]
[1,]    1    2    5
[2,]    3    4    5
[3,]    1    3    6
[4,]    2    3    7
[5,]    2    4    6
[6,]    1    4    7
[7,]    5    6    7
&gt; isGYD(my.design)
&nbsp;
[1] The design is a balanced incomplete block design w.r.t. rows.</pre></div></div>

<p>In this situation we are able to generate a valid BIBD experiment with the specified parameters.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/generating-balanced-incomplete-block-designs-bibd/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>R Commander &#8211; two-way analysis of variance</title>
		<link>http://www.wekaleamstudios.co.uk/posts/r-commander-two-way-analysis-of-variance/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/r-commander-two-way-analysis-of-variance/#comments</comments>
		<pubDate>Fri, 25 Jun 2010 09:42:16 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[R Commander]]></category>
		<category><![CDATA[Statistical Analysis]]></category>
		<category><![CDATA[ANOVA]]></category>
		<category><![CDATA[GUI]]></category>
		<category><![CDATA[Rcmdr]]></category>
		<category><![CDATA[Two-way Analysis of Variance]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1224</guid>
		<description><![CDATA[Two way analysis of variance models can be fitted to data using the R Commander GUI. The general approach is similar to fitting the other types of model in R Commander described in previous posts. Fast Tube by Casper The &#8220;Statistics&#8221; menu provides access to some analysis of variance models via the &#8220;Means&#8221; sub-menu: Multi-way [...]]]></description>
			<content:encoded><![CDATA[<p>Two way analysis of variance models can be fitted to data using the <a href="http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/">R Commander</a> GUI. The general approach is similar to fitting the other types of model in <strong>R Commander</strong> described in previous posts.<span id="more-1224"></span></p>
<p><!--[Fast Tube]--><span id="uSI1CIHEZcc" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/r-commander-two-way-analysis-of-variance/#uSI1CIHEZcc"><img src="http://i.ytimg.com/vi/uSI1CIHEZcc/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The <strong>&#8220;Statistics&#8221;</strong> menu provides access to some analysis of variance models via the <strong>&#8220;Means&#8221;</strong> sub-menu:</p>
<ul>
<li>Multi-way ANOVA &#8211; the simplest two-way analysis of variance model that can be applied to a set of data.</li>
</ul>
<p>The <strong>&#8220;Models&#8221;</strong> menu provides access to various diagnostics for analysis of variance models via the <strong>&#8220;Graphs&#8221;</strong> sub-menu including:</p>
<ul>
<li>Basic diagnostic plots &#8211; four commonly used residual diagnostics including fitted values versus residuals and a normal probability plot for the residuals.</li>
</ul>
<p>Other useful resources are provided on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/r-commander-two-way-analysis-of-variance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

