<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Software for Exploratory Data Analysis and Statistical Modelling &#187; Design of Experiments</title>
	<atom:link href="http://www.wekaleamstudios.co.uk/topics/statistical-analysis/design-of-experiments/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.wekaleamstudios.co.uk</link>
	<description>Statistical Modelling with R</description>
	<lastBuildDate>Wed, 01 Feb 2012 19:44:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Fractional Factorial Designs using FrF2</title>
		<link>http://www.wekaleamstudios.co.uk/posts/fractional-factorial-designs-using-frf2/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/fractional-factorial-designs-using-frf2/#comments</comments>
		<pubDate>Wed, 18 May 2011 18:17:18 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Design of Experiments]]></category>
		<category><![CDATA[fractional factorial]]></category>
		<category><![CDATA[FrF2]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1625</guid>
		<description><![CDATA[The FrF2 package for R can be used to create regular and non-regular Fractional Factorial 2-level designs. It is reasonably straightforward to use. First step is to install the package then make it available for use in the current session: require(FrF2) A basic call to the main functino FrF2 specifies the number of runs in [...]]]></description>
			<content:encoded><![CDATA[<p>The <strong>FrF2</strong> package for <strong>R</strong> can be used to <em>create regular and non-regular Fractional Factorial 2-level designs</em>. It is reasonably straightforward to use.<span id="more-1625"></span></p>
<p>First step is to install the package then make it available for use in the current session:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">require(FrF2)</pre></div></div>

<p>A basic call to the main functino <strong>FrF2</strong> specifies the number of runs in the fractional factorial design (which needs to be a multiple of 2) and the number of factors. For example a three factor design would have a total of eight runs if it was a full factorial but if we wanted to go with four runs then we can generate the design like this:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; FrF2(4, 3)
   A  B  C
1  1 -1 -1
2 -1  1 -1
3 -1 -1  1
4  1  1  1
class=design, type= FrF2</pre></div></div>

<p>The default output labels the factors A, B, C and so on and the factor levels are -1 and +1 for the two levels of each factor. We can change the level names to low and high using the <strong>default.levels</strong> function argument:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; FrF2(4, 3, default.levels = c(&quot;low&quot;, &quot;high&quot;))
     A    B    C
1 high high high
2  low high  low
3 high  low  low
4  low  low high
class=design, type= FrF2</pre></div></div>

<p>The factors can be specified as a list of names rather than the number of factors via the <strong>factor.names</strong> argument:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; FrF2(4, factor.names = c(&quot;One&quot;, &quot;Two&quot;, &quot;Three&quot;),
  default.levels = c(&quot;low&quot;, &quot;high&quot;))
   One  Two Three
1  low high   low
2 high high  high
3  low  low  high
4 high  low   low
class=design, type= FrF2</pre></div></div>

<p>These are the basics and there are other features for greater control over the confounding between factors and their interactions that is introduced by using a fractional factorial design.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/fractional-factorial-designs-using-frf2/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Generating Balanced Incomplete Block Designs (BIBD)</title>
		<link>http://www.wekaleamstudios.co.uk/posts/generating-balanced-incomplete-block-designs-bibd/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/generating-balanced-incomplete-block-designs-bibd/#comments</comments>
		<pubDate>Fri, 16 Jul 2010 12:04:50 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Design of Experiments]]></category>
		<category><![CDATA[Balanced Incomplete Block Design]]></category>
		<category><![CDATA[BIBD]]></category>
		<category><![CDATA[crossdes]]></category>
		<category><![CDATA[find.BIB]]></category>
		<category><![CDATA[isGYD]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1296</guid>
		<description><![CDATA[The Balanced Incomplete Block Design (BIBD) is a well studied experimental design that has various desirable features from a statistical perspective. The crossdes package in R provides a way to generate a block design for some given parameters and test wheter this design satisfies the BIBD conditions. For a BIBD there are v treatments repeated [...]]]></description>
			<content:encoded><![CDATA[<p>The Balanced Incomplete Block Design (BIBD) is a well studied experimental design that has various desirable features from a statistical perspective. The <strong>crossdes</strong> package in <strong>R</strong> provides a way to generate a block design for some given parameters and test wheter this design satisfies the BIBD conditions.<span id="more-1296"></span></p>
<p>For a BIBD there are <strong>v</strong> treatments repeated <strong>r</strong> times in <strong>b</strong> blocks of <strong>k</strong> observations. There is a fifth parameter <strong>lambda</strong> that records the number of blocks where every pair of treatment occurs in the design.</p>
<p>We first load the crossdes package in our sessions:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">require(crossdes)</pre></div></div>

<p>The function <strong>find.BIB</strong> is used to generate a block design with specific number of treatments, blocks (rows of the design) and elements per block (columns of the design).</p>
<p>Consider an example with five treatments in four blocks of three elements. We can create a block design via:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; find.BIB(5, 4, 3)
     [,1] [,2] [,3]
[1,]    1    3    4
[2,]    2    4    5
[3,]    2    3    5
[4,]    1    2    5</pre></div></div>

<p>This design is not a BIBD because the treatments are not all repeated the same number of times in the design and we can check this with the <strong>isGYD</strong> function. For this example:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; isGYD(find.BIB(5, 4, 3))
&nbsp;
[1] The design is neither balanced w.r.t. rows nor w.r.t. columns.</pre></div></div>

<p>This confirms what we can see from the design.</p>
<p>Let us instead consider a design with seven treatments and seven blocks of three elements to see whether we can create a BIBD with these parameters:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; my.design = find.BIB(7, 7, 3)
&gt; my.design
     [,1] [,2] [,3]
[1,]    1    2    5
[2,]    3    4    5
[3,]    1    3    6
[4,]    2    3    7
[5,]    2    4    6
[6,]    1    4    7
[7,]    5    6    7
&gt; isGYD(my.design)
&nbsp;
[1] The design is a balanced incomplete block design w.r.t. rows.</pre></div></div>

<p>In this situation we are able to generate a valid BIBD experiment with the specified parameters.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/generating-balanced-incomplete-block-designs-bibd/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Design of Experiments &#8211; Block Designs</title>
		<link>http://www.wekaleamstudios.co.uk/posts/design-of-experiments-block-designs/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/design-of-experiments-block-designs/#comments</comments>
		<pubDate>Sat, 20 Feb 2010 20:05:26 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Design of Experiments]]></category>
		<category><![CDATA[block]]></category>
		<category><![CDATA[Block Designs]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=632</guid>
		<description><![CDATA[In many experiments where the investigator is comparing a set of treatments there is the possibility of one or more sources of variability in the experimental measurements that can be accounted for during the design stage of the experimentation. For example we might be investigating four different pieces of machinery using say two different operators, [...]]]></description>
			<content:encoded><![CDATA[<p>In many experiments where the investigator is comparing a set of <strong>treatments</strong> there is the possibility of one or more sources of variability in the experimental measurements that can be accounted for during the design stage of the experimentation. For example we might be investigating four different pieces of machinery using say two different operators, who would be expected to display different degrees of competence with the equipment. Or we might not be able to run all of the experimental combinations in one session so we would want to take into account systematic differences that are due to experiments in the various sessions.<span id="more-632"></span></p>
<p>The least complicated scenario is where we would have a single (nuisance) factor that we want to control for in the experiment. The statistical model used to describe the data collected in such an experiment could be written in the form:</p>
<p style="text-align: center;"><img class="size-full wp-image-728  aligncenter" title="General Block Design Model" src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2009/12/BlockModel1.gif" alt="General Block Design Model Line 1" width="184" height="20" /></p>
<p style="text-align: center;"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2009/12/BlockModel2.gif" alt="General Block Design Model Line 2" title="General Block Design Model" width="323" height="21" class="alignnone size-full wp-image-732" /></p>
<p>where there are <strong>v</strong> treatments in <strong>b</strong> blocks and the number of units in each block does not have to be the same and is denoted using the <strong>k</strong> subscript.</p>
<p>In a complete block design all treatments occur the same number of times in every block, usually one replicate of all treatments per block. There will be situations where the number of treatments is too large for all of them to be included in every block of the design. In these situations an incomplete block design would be used for running an experiment.</p>
<p>A special type of design is the balanced incomplete block design (BIBD), where the <strong>v</strong>  treatments are investigated by allocating them to <strong>b</strong> blocks of equal size <strong>k</strong>. We have that <strong>k</strong> is less than <strong>t</strong> and <strong>b</strong> and <strong>k</strong> are chosen so that <strong>b</strong> * <strong>k</strong> is a multiple of <strong>v</strong>. All of the treatments occur exactly <strong>r</strong> times in the design and every pair of treatments occur together in <strong>lambda</strong> of the <strong>b</strong> blocks.</p>
<p>Two-way analysis of variance (ANOVA) is used to analyse data collected from an experiment using a block design, as discussed elsewhere in this <a href="http://www.wekaleamstudios.co.uk/?p=660">post</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/design-of-experiments-block-designs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Two-way Analysis of Variance (ANOVA)</title>
		<link>http://www.wekaleamstudios.co.uk/posts/two-way-analysis-of-variance-anova/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/two-way-analysis-of-variance-anova/#comments</comments>
		<pubDate>Mon, 15 Feb 2010 21:45:02 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Analysis of Variance]]></category>
		<category><![CDATA[Design of Experiments]]></category>
		<category><![CDATA[Grammar of Graphics]]></category>
		<category><![CDATA[Statistical Modelling]]></category>
		<category><![CDATA[analysis of variance]]></category>
		<category><![CDATA[ANOVA]]></category>
		<category><![CDATA[aov]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[summary]]></category>
		<category><![CDATA[Tukey HSD]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[two way]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=660</guid>
		<description><![CDATA[The analysis of variance (ANOVA) model can be extended from making a comparison between multiple groups to take into account additional factors in an experiment. The simplest extension is from one-way to two-way ANOVA where a second factor is included in the model as well as a potential interaction between the two factors. As an [...]]]></description>
			<content:encoded><![CDATA[<p>The analysis of variance (<strong>ANOVA</strong>) model can be extended from making a comparison between multiple groups to take into account additional factors in an experiment. The simplest extension is from one-way to two-way <strong>ANOVA</strong> where a second factor is included in the model as well as a potential interaction between the two factors.<span id="more-660"></span></p>
<p>As an example consider a company that regularly has to ship parcels between its various (five for this example) sub-offices and has the option of using three competing parcel delivery services, all of which charge roughly similar amounts for each delivery. To determine which service to use, the company decides to run an experiment shipping three packages from its head office to each of the five sub-offices. The delivery time for each package is recorded and the data loaded into <strong>R</strong>:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">delivery.df = data.frame(
  Service = c(rep(&quot;Carrier 1&quot;, 15), rep(&quot;Carrier 2&quot;, 15),
    rep(&quot;Carrier 3&quot;, 15)),
  Destination = c(rep(c(&quot;Office 1&quot;, &quot;Office 2&quot;, &quot;Office 3&quot;,
    &quot;Office 4&quot;, &quot;Office 5&quot;), 9)),
  Time = c(15.23, 14.32, 14.77, 15.12, 14.05,
  15.48, 14.13, 14.46, 15.62, 14.23, 15.19, 14.67, 14.48, 15.34, 14.22,
  16.66, 16.27, 16.35, 16.93, 15.05, 16.98, 16.43, 15.95, 16.73, 15.62,
  16.53, 16.26, 15.69, 16.97, 15.37, 17.12, 16.65, 15.73, 17.77, 15.52,
  16.15, 16.86, 15.18, 17.96, 15.26, 16.36, 16.44, 14.82, 17.62, 15.04)
)</pre></div></div>

<p>The data is then displayed using a dot plot for an initial visual investigation of any trends in delivery time between the three services and across the five sub-offices. The colour aesthetic is used to distinguish between the three services in the plot.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(delivery.df, aes(Time, Destination, colour = Service)) + geom_point()</pre></div></div>

<p>This code produces the following graph:</p>
<div id="attachment_792" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/02/anova-twoway-data.png"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/02/anova-twoway-data-300x300.png" alt="Service Delivery Time by Destination" title="Delivery Time" width="300" height="300" class="size-medium wp-image-792" /></a><p class="wp-caption-text">Graph of the delivery time for different services and destintions</p></div>
<p>The graph shows a general pattern of service carrier 1 having shorter delivery times than the other two services. There is also an indication that the differences between the services varies for the five sub-offices and we might expect the interaction term to be significant in the two-way <strong>ANOVA</strong> model. To fit the two-way <strong>ANOVA</strong> model we use this code:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">delivery.mod1 = aov(Time ~ Destination*Service, data = delivery.df)</pre></div></div>

<p>The <strong>*</strong> symbol instructs <strong>R</strong> to create a formula that includes main effects for both Destination and Service as well as the two-way interaction between these two factors. We save the fitted model to an object which we can summarise as follows to test for importance of the various model terms:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; summary(delivery.mod1)
                    Df  Sum Sq Mean Sq  F value    Pr(&gt;F)    
Destination          4 17.5415  4.3854  61.1553 5.408e-14 ***
Service              2 23.1706 11.5853 161.5599 &lt; 2.2e-16 ***
Destination:Service  8  4.1888  0.5236   7.3018 2.360e-05 ***
Residuals           30  2.1513  0.0717                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1</pre></div></div>

<p>We have strong evidence here that there are differences between the three delivery services, between the five sub-office destinations and that there is an interaction between destination and service in line with what we saw in the original plot of the data. Now that we have fitted the model and identified the important factors we need to investigate the model diagnostics to ensure that the various assumptions are broadly valid.</p>
<p>We can plot the model residuals against fitted values to look for obvious trends that are not consistent with the model assumptions about independence and common variance. The first step is to create a data frame with the fitted values and residuals from the above model:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">delivery.res = delivery.df
delivery.res$M1.Fit = fitted(delivery.mod1)
delivery.res$M1.Resid = resid(delivery.mod1)</pre></div></div>

<p>Then a scatter plot is used to display the fitted values and residuals where the colour asthetic highlights which points correspond to the three competing delivery services:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(delivery.res, aes(M1.Fit, M1.Resid, colour = Service)) + geom_point() +
  xlab(&quot;Fitted Values&quot;) + ylab(&quot;Residuals&quot;)</pre></div></div>

<p>The <strong>xlab()</strong> and <strong>ylab()</strong> are used to change the text on the axis labels. The residual diagnostic plot is:</p>
<div id="attachment_798" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/02/anova-twoway-resid1.png"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/02/anova-twoway-resid1-300x300.png" alt="Model Residual Plot" title="Model Residual Plot" width="300" height="300" class="size-medium wp-image-798" /></a><p class="wp-caption-text">Diagnostic Residual Plot for Delivery Time Model</p></div>
<p>There are no obvious patterns in this plot that suggest problems with the two-way <strong>ANOVA</strong> model that we fitted to the data.</p>
<p>As an alternative display we could separate the residuals into destination sub-offices, where the <strong>facet_wrap()</strong> function instructs <strong>ggplot</strong> to create a separate display (panel) for each of the destinations.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(delivery.res, aes(M1.Fit, M1.Resid, colour = Service)) +
  geom_point() + xlab(&quot;Fitted Values&quot;) + ylab(&quot;Residuals&quot;) +
  facet_wrap( ~ Destination)</pre></div></div>

<p>To produce the following alternative residual plot:</p>
<div id="attachment_799" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/02/anova-twoway-resid2.png"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/02/anova-twoway-resid2-300x300.png" alt="Model Residual Plot" title="Model Residual Plot" width="300" height="300" class="size-medium wp-image-799" /></a><p class="wp-caption-text">Diagnostic Residual Plot for Delivery Time Model by Destination</p></div>
<p>No obvious problems in this diagnostic plot.</p>
<p>We could also consider dividing the data by delivery service to get a different view of the residuals:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(delivery.res, aes(M1.Fit, M1.Resid, colour = Destination)) +
  geom_point() + xlab(&quot;Fitted Values&quot;) + ylab(&quot;Residuals&quot;) +
  facet_wrap( ~ Service)</pre></div></div>

<p>This creates the following graph:</p>
<div id="attachment_800" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/02/anova-twoway-resid3.png"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/02/anova-twoway-resid3-300x300.png" alt="Model Residual Plot" title="Model Residual Plot" width="300" height="300" class="size-medium wp-image-800" /></a><p class="wp-caption-text">Diagnostic Residual Plot for Delivery Time Model by Service</p></div>
<p>Again there is nothing substantial here to lead us to consider an alternative analysis.</p>
<p>Lastly we consider the normal probability plot of the model residuals, using the <strong>stat_qq()</strong> option:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(delivery.res, aes(sample = M1.Resid)) + stat_qq()</pre></div></div>

<p>The quantile plot is:</p>
<div id="attachment_806" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/02/anova-twoway-qq.png"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/02/anova-twoway-qq-300x300.png" alt="Quantile Plot" title="Quantile Plot" width="300" height="300" class="size-medium wp-image-806" /></a><p class="wp-caption-text">Normal Probability Plot for Delivery Time Model</p></div>
<p>This plot is very close to the straight line we would expect to observe if the data was a close approximation to a normal distribution. To round off the analysis we look at the Tukey HSD multiple comparisons to confirm that the differences are between delivery service 1 and the other two competing services:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; TukeyHSD(delivery.mod1, which = &quot;Service&quot;)
  Tukey multiple comparisons of means
    95% family-wise confidence level
&nbsp;
Fit: aov(formula = Time ~ Destination * Service, data = delivery.df)
&nbsp;
$Service
                        diff        lwr       upr     p adj
Carrier 2-Carrier 1 1.498667  1.2576092 1.7397241 0.0000000
Carrier 3-Carrier 1 1.544667  1.3036092 1.7857241 0.0000000
Carrier 3-Carrier 2 0.046000 -0.1950575 0.2870575 0.8856246</pre></div></div>

<p>Even with the multiple comparison post-hoc adjustment there is very strong evidence for the differences that we have consistenly observed throughout the analysis.</p>
<p>We can use <strong>ggplot</strong> to visualise the difference in mean delivery time for the services and the 95% confidence intervals on these differences. We create a data frame from the <strong>TukeyHSD</strong> output by extracting the component relating to the delivery service comparison and add the text labels by extracting the row names from the data frame.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">delivery.hsd = data.frame(TukeyHSD(delivery.mod1, which = &quot;Service&quot;)$Service)
delivery.hsd$Comparison = row.names(delivery.hsd)</pre></div></div>

<p>We then use the <strong>geom_pointrange()</strong> to specify lower, middle and upper values based on the three pairwise comparisons of interest.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(delivery.hsd, aes(Comparison, y = diff, ymin = lwr, ymax = upr)) +
  geom_pointrange() + ylab(&quot;Difference in Mean Delivery Time by Service&quot;) +
  coord_flip()</pre></div></div>

<p>The <strong>coord_flip()</strong> is used to make the confidence intervals horizontal rather than vertical on the graph. This can be confusing for creating the axis labels as we specify the label where it would appear prior to the filp of coordinates. In the example above we add text to the y axis but this now appears on the x axis in the final graph:</p>
<div id="attachment_811" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/02/anova-twoway-tukeyHSD.png"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/02/anova-twoway-tukeyHSD-300x300.png" alt="Tukey HSD" title="Tukey HSD" width="300" height="300" class="size-medium wp-image-811" /></a><p class="wp-caption-text">Plot of Confidence Intervals for Mean Differences using Tukey HSD</p></div>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/two-way-analysis-of-variance-anova/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>One-way ANOVA (cont.)</title>
		<link>http://www.wekaleamstudios.co.uk/posts/one-way-anova-cont/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/one-way-anova-cont/#comments</comments>
		<pubDate>Fri, 12 Feb 2010 13:45:34 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Analysis of Variance]]></category>
		<category><![CDATA[Design of Experiments]]></category>
		<category><![CDATA[Statistical Modelling]]></category>
		<category><![CDATA[ANOVA]]></category>
		<category><![CDATA[aov]]></category>
		<category><![CDATA[lm]]></category>
		<category><![CDATA[multiple comparisons]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=767</guid>
		<description><![CDATA[In a previous post we considered using R to fit one-way ANOVA models to data. In this post we consider a few additional ways that we can look at the analysis. Fast Tube by Casper Fast Tube by Casper In the analysis we made use of the linear model function lm and the analysis could [...]]]></description>
			<content:encoded><![CDATA[<p>In a previous <a href="http://www.wekaleamstudios.co.uk/?p=658">post</a> we considered using <strong>R</strong> to fit one-way ANOVA models to data. In this post we consider a few additional ways that we can look at the analysis.<span id="more-767"></span></p>
<p><!--[Fast Tube]--><span id="PBE-llEkiHk" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/one-way-anova-cont/#PBE-llEkiHk"><img src="http://i.ytimg.com/vi/PBE-llEkiHk/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p><!--[Fast Tube]--><span id="r_uSH0Xaau8" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/one-way-anova-cont/#r_uSH0Xaau8"><img src="http://i.ytimg.com/vi/r_uSH0Xaau8/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>In the analysis we made use of the linear model function <strong>lm</strong> and the analysis could be conducted using the <strong>aov</strong> function. The code used to fit the model is very similar:</p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> plant.<span style="">mod2</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">aov</span><span style="color: #080;">&#40;</span>weight ~ group, <span style="color: #0000FF; font-weight: bold;">data</span> <span style="color: #080;">=</span> plant.<span style="">df</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">summary</span><span style="color: #080;">&#40;</span>plant.<span style="">mod2</span><span style="color: #080;">&#41;</span>
            Df  Sum Sq Mean Sq <span style="color: #0000FF; font-weight: bold;">F</span> value  Pr<span style="color: #080;">&#40;</span><span style="color: #080;">&gt;</span>F<span style="color: #080;">&#41;</span>  
group        <span style="color: #ff0000;">2</span>  <span style="color: #ff0000;">3.7663</span>  <span style="color: #ff0000;">1.8832</span>  <span style="color: #ff0000;">4.8461</span> <span style="color: #ff0000;">0.01591</span> <span style="color: #080;">*</span>
Residuals   <span style="color: #ff0000;">27</span> <span style="color: #ff0000;">10.4921</span>  <span style="color: #ff0000;">0.3886</span>                  
<span style="color: #080;">---</span>
Signif. <span style="color: #0000FF; font-weight: bold;">codes</span><span style="color: #080;">:</span>  <span style="color: #ff0000;">0</span> ‘<span style="color: #080;">***</span>’ <span style="color: #ff0000;">0.001</span> ‘<span style="color: #080;">**</span>’ <span style="color: #ff0000;">0.01</span> ‘<span style="color: #080;">*</span>’ <span style="color: #ff0000;">0.05</span> ‘.’ <span style="color: #ff0000;">0.1</span> ‘ ’ <span style="color: #ff0000;">1</span></pre></div></div>

<p>The output from using the <strong>summary</strong> function of the fitted model object shows the analysis of variance table with the p-value showing evidence of differences between the three groups. In <strong>R</strong> we can investigated the particular groups where there are differences using Tukey&#8217;s multiple comparisons:</p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">TukeyHSD</span><span style="color: #080;">&#40;</span>plant.<span style="">mod2</span><span style="color: #080;">&#41;</span>
  Tukey multiple comparisons of means
    <span style="color: #ff0000;">95</span><span style="color: #080;">%</span> family<span style="color: #080;">-</span>wise confidence level
&nbsp;
Fit<span style="color: #080;">:</span> <span style="color: #0000FF; font-weight: bold;">aov</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">formula</span> <span style="color: #080;">=</span> weight ~ group, <span style="color: #0000FF; font-weight: bold;">data</span> <span style="color: #080;">=</span> plant.<span style="">df</span><span style="color: #080;">&#41;</span>
&nbsp;
$group
                          <span style="color: #0000FF; font-weight: bold;">diff</span>        lwr       upr     p adj
Treatment <span style="color: #ff0000;">1</span><span style="color: #080;">-</span>Control     <span style="color: #080;">-</span><span style="color: #ff0000;">0.371</span> <span style="color: #080;">-</span><span style="color: #ff0000;">1.0622161</span> <span style="color: #ff0000;">0.3202161</span> <span style="color: #ff0000;">0.3908711</span>
Treatment <span style="color: #ff0000;">2</span><span style="color: #080;">-</span>Control      <span style="color: #ff0000;">0.494</span> <span style="color: #080;">-</span><span style="color: #ff0000;">0.1972161</span> <span style="color: #ff0000;">1.1852161</span> <span style="color: #ff0000;">0.1979960</span>
Treatment <span style="color: #ff0000;">2</span><span style="color: #080;">-</span>Treatment <span style="color: #ff0000;">1</span>  <span style="color: #ff0000;">0.865</span>  <span style="color: #ff0000;">0.1737839</span> <span style="color: #ff0000;">1.5562161</span> <span style="color: #ff0000;">0.0120064</span></pre></div></div>

<p>The multiple comparison tests highlight that the difference is due to comparing treatments 1 and 2. These 95% confidence intervals for the differences shown above can be plotted:</p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">plot</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">TukeyHSD</span><span style="color: #080;">&#40;</span>plant.<span style="">mod2</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span></pre></div></div>

<p>which gives</p>
<p><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/02/anova-oneway-tukeyHSD.png"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/02/anova-oneway-tukeyHSD-300x300.png" alt="Tukey HSD Plot" title="Plant Growth Tukey Multiple Comparison" width="300" height="300" class="aligncenter size-medium wp-image-771" /></a></p>
<p>The post-hoc adjustments are recommended as we are testing after looking at the data rather than undertaking a pre-planned analysis.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/one-way-anova-cont/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>One-way Analysis of Variance (ANOVA)</title>
		<link>http://www.wekaleamstudios.co.uk/posts/one-way-analysis-of-variance-anova/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/one-way-analysis-of-variance-anova/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 21:01:24 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Analysis of Variance]]></category>
		<category><![CDATA[Design of Experiments]]></category>
		<category><![CDATA[Grammar of Graphics]]></category>
		<category><![CDATA[Statistical Modelling]]></category>
		<category><![CDATA[analysis of variance]]></category>
		<category><![CDATA[ANOVA]]></category>
		<category><![CDATA[factor]]></category>
		<category><![CDATA[fitted values]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[lm]]></category>
		<category><![CDATA[one way]]></category>
		<category><![CDATA[residuals]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=658</guid>
		<description><![CDATA[Analysis of Variance (ANOVA) is a commonly used statistical technique for investigating data by comparing the means of subsets of the data. The base case is the one-way ANOVA which is an extension of two-sample t test for independent groups covering situations where there are more than two groups being compared. Fast Tube by Casper [...]]]></description>
			<content:encoded><![CDATA[<p>Analysis of Variance (<strong>ANOVA</strong>) is a commonly used statistical technique for investigating data by comparing the means of subsets of the data. The base case is the one-way <strong>ANOVA</strong> which is an extension of two-sample t test for independent groups covering situations where there are more than two groups being compared.<span id="more-658"></span></p>
<p><!--[Fast Tube]--><span id="PBE-llEkiHk" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/one-way-analysis-of-variance-anova/#PBE-llEkiHk"><img src="http://i.ytimg.com/vi/PBE-llEkiHk/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p><!--[Fast Tube]--><span id="r_uSH0Xaau8" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/one-way-analysis-of-variance-anova/#r_uSH0Xaau8"><img src="http://i.ytimg.com/vi/r_uSH0Xaau8/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>In one-way <strong>ANOVA</strong> the data is sub-divided into groups based on a single classification factor and the standard terminology used to describe the set of factor levels is <strong>treatment</strong> even though this might not always have meaning for the particular application. There is variation in the measurements taken on the individual components of the data set and ANOVA investigates whether this variation can be explained by the grouping introduced by the classification factor.</p>
<p>As an example we consider one of the data sets available with R relating to an experiment into plant growth. The purpose of the experiment was to compare the yields on the plants for a control group and two treatments of interest. The response variable was a measurement taken on the dried weight of the plants.</p>
<p>The first step in the investigation is to take a copy of the data frame so that we can make some adjustments as necessary while leaving the original data alone. We use the <strong>factor</strong> function to re-define the labels of the <strong>group</strong> variables that will appear in the output and graphs:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">plant.df = PlantGrowth
plant.df$group = factor(plant.df$group,
  labels = c(&quot;Control&quot;, &quot;Treatment 1&quot;, &quot;Treatment 2&quot;))</pre></div></div>

<p>The <strong>labels</strong> argument is a list of names corresponding to the levels of the <strong>group</strong> factor variable.</p>
<p>A boxplot of the distributions of the dried weights for the three competing groups is created using the <strong>ggplot</strong> package:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">require(ggplot2)
&nbsp;
ggplot(plant.df, aes(x = group, y = weight)) +
  geom_boxplot(fill = &quot;grey80&quot;, colour = &quot;blue&quot;) +
  scale_x_discrete() + xlab(&quot;Treatment Group&quot;) +
  ylab(&quot;Dried weight of plants&quot;)</pre></div></div>

<p>The <strong>geom_boxplot()</strong> option is used to specify background and outline colours for the boxes. The axis labels are created with the <strong>xlab()</strong> and <strong>ylab()</strong> options. The plot that is produce looks like this:</p>
<p><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/01/anova-oneway-data.png"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/01/anova-oneway-data-300x300.png" alt="Boxplot of Plant Growth by Treatment Group" title="Plant Growth Data Summary" width="300" height="300" class="aligncenter size-medium wp-image-754" /></a></p>
<p>Initial inspection of the data suggests that there are differences in the dried weight for the two treatments but it is not so clear cut to determine whether the treatments are different to the control group. To investigate these differences we fit the one-way ANOVA model using the <strong>lm</strong> function and look at the parameter estimates and standard errors for the treatment effects. The function call is:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">plant.mod1 = lm(weight ~ group, data = plant.df)</pre></div></div>

<p>We save the model fitted to the data in an object so that we can undertake various actions to study the goodness of the fit to the data and other model assumptions. The standard summary of a <strong>lm</strong> object is used to produce the following output:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; summary(plant.mod1)
&nbsp;
Call:
lm(formula = weight ~ group, data = plant.df)
&nbsp;
Residuals:
    Min      1Q  Median      3Q     Max 
-1.0710 -0.4180 -0.0060  0.2627  1.3690 
&nbsp;
Coefficients:
                 Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept)        5.0320     0.1971  25.527   &lt;2e-16 ***
groupTreatment 1  -0.3710     0.2788  -1.331   0.1944    
groupTreatment 2   0.4940     0.2788   1.772   0.0877 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
&nbsp;
Residual standard error: 0.6234 on 27 degrees of freedom
Multiple R-squared: 0.2641,     Adjusted R-squared: 0.2096 
F-statistic: 4.846 on 2 and 27 DF,  p-value: 0.01591</pre></div></div>

<p>The model output indicates some evidence of a difference in the average growth for the 2nd treatment compared to the control group. An analysis of variance table for this model can be produced via the <strong>anova</strong> command:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; anova(plant.mod1)
Analysis of Variance Table
&nbsp;
Response: weight
          Df  Sum Sq Mean Sq F value  Pr(&gt;F)  
group      2  3.7663  1.8832  4.8461 0.01591 *
Residuals 27 10.4921  0.3886                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1</pre></div></div>

<p>This table confirms that there are differences between the groups which were highlighted in the model summary. The function <strong>confint</strong> is used to calculate confidence intervals on the treatment parameters, by default 95% confidence intervals:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; confint(plant.mod1)
                       2.5 %    97.5 %
(Intercept)       4.62752600 5.4364740
groupTreatment 1 -0.94301261 0.2010126
groupTreatment 2 -0.07801261 1.0660126</pre></div></div>

<p>The model residuals can be plotted against the fitted values to investigate the model assumptions. First we create a data frame with the fitted values, residuals and treatment identifiers:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">plant.mod = data.frame(Fitted = fitted(plant.mod1),
  Residuals = resid(plant.mod1), Treatment = plant.df$group)</pre></div></div>

<p>and then produce the plot:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">ggplot(plant.mod, aes(Fitted, Residuals, colour = Treatment)) + geom_point()</pre></div></div>

<p>which produces this graph:<br />
<a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/02/anova-oneway-residualplot.png"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2010/02/anova-oneway-residualplot-300x300.png" alt="Residual diagnostic plot" title="Plant Growth Residual Plot" width="300" height="300" class="aligncenter size-medium wp-image-762" /></a><br />
We can see that there is no major problem with the diagnostic plot but some evidence of different variabilities in the spread of the residuals for the three treatment groups.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/one-way-analysis-of-variance-anova/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Design of Experiments &#8211; Blocking and Full Factorial Experimental Design Plans</title>
		<link>http://www.wekaleamstudios.co.uk/posts/design-of-experiments-blocking-and-full-factorial-experimental-design-plans/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/design-of-experiments-blocking-and-full-factorial-experimental-design-plans/#comments</comments>
		<pubDate>Sun, 06 Dec 2009 15:37:35 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Design of Experiments]]></category>
		<category><![CDATA[Blocking]]></category>
		<category><![CDATA[conf.design]]></category>
		<category><![CDATA[Confounding]]></category>
		<category><![CDATA[interaction]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=630</guid>
		<description><![CDATA[When considering using a full factorial experimental design there may be constraints on the number of experiments that can be run during a particular session, or there may be other practical constraints that introduce systematic differences into an experiment that can be handled during the design and analysis of the data collected during the experiment. [...]]]></description>
			<content:encoded><![CDATA[<p>When considering using a full factorial experimental design there may be constraints on the number of experiments that can be run during a particular session, or there may be other practical constraints that introduce systematic differences into an experiment that can be handled during the design and analysis of the data collected during the experiment.<span id="more-630"></span></p>
<p>Blocking is a technique used in design of experiments methodology to deal with the systematic differences to ensure that all the factors of interest and interactions between the factors can be assessed in the design. When blocking occurs one or more of the interactions is likely to be confounded with the block effects but a good choice of blocking should hopefully ensure that it is a higher order interaction that would be challenging to interpret or not be expected to be important that is confounded.</p>
<p>The conf.design package in R is described by its author as <em>a small library contains a series of simple tools for constructing and manipulating confounded and fractional factorial designs</em>. The function <strong>conf.design</strong> can be used to  <em>construct symmetric confounded factorial designs</em>.</p>
<p>A very simple example would be a three factor experiment where each factor has low and high settings (levels). If we wanted to divide the experiment into two blocks of four experimental units then we could confounded the block effect with the three way interaction between the factors. The following code would create the required design plan:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">conf.design(rbind(c(1,1,1)), p=2, treatment.names = c(&quot;F1&quot;,&quot;F2&quot;,&quot;F3&quot;))</pre></div></div>

<p>The first argument is a matrix, with a single row in this case as there are only two blocks, which specifies the levels of the factors for the effect to be confounded with the blocks. The output from this function call is:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">  Blocks F1 F2 F3
1      0  0  0  0
2      0  1  1  0
3      0  1  0  1
4      0  0  1  1
5      1  1  0  0
6      1  0  1  0
7      1  0  0  1
8      1  1  1  1</pre></div></div>

<p>This shows two blocks, labelled 0 and 1, and the settings of the experiments to run in each block. In the first block the four factor combinations would be:</p>
<ul>
<li>F1 low, F2 low, and F3 low.</li>
<li>F1 high, F2 high, and F3 low.</li>
<li>F1 high, F2 low, and F3 high.</li>
<li>F1 low, F2 high, and F3 high.</li>
</ul>
<p>The remaining four combinations are use in the second block of experiments.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/design-of-experiments-blocking-and-full-factorial-experimental-design-plans/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Design of Experiments &#8211; Full Factorial Designs</title>
		<link>http://www.wekaleamstudios.co.uk/posts/design-of-experiments-full-factorial/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/design-of-experiments-full-factorial/#comments</comments>
		<pubDate>Tue, 01 Dec 2009 19:27:25 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Design of Experiments]]></category>
		<category><![CDATA[Full Factorial Design]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=617</guid>
		<description><![CDATA[In designs where there are multiple factors, all with a discrete group of level settings, the full enumeration of all combinations of factor levels is referred to as a full factorial design. As the number of factors increases, potentially along with the settings for the factors, the total number of experimental units increases rapidly. In [...]]]></description>
			<content:encoded><![CDATA[<p>In designs where there are multiple factors, all with a discrete group of level settings, the full enumeration of all combinations of factor levels is referred to as a <strong>full factorial design</strong>. As the number of factors increases, potentially along with the settings for the factors, the total number of experimental units increases rapidly.<span id="more-617"></span></p>
<p>In many cases each factor takes only two levels, often referred to as the low and high levels, the design is known as a 2^k experiment. Given a three factor setup where each factor takes two levels we can create the full factorial design using the <strong>expand.grid</strong> function:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">expand.grid(Factor1 = c(&quot;Low&quot;, &quot;High&quot;), Factor2 = c(&quot;Low&quot;, &quot;High&quot;),
  Factor3 = c(&quot;Low&quot;, &quot;High&quot;))</pre></div></div>

<p>which creates the following design:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">  Factor1 Factor2 Factor3
1     Low     Low     Low
2    High     Low     Low
3     Low    High     Low
4    High    High     Low
5     Low     Low    High
6    High     Low    High
7     Low    High    High
8    High    High    High</pre></div></div>

<p>We could also make use of the <strong>gen.factorial</strong> function from the <strong>AlgDesign</strong> package. In this function we use a vector to specify the number of levels for each of the variables, the number of variables and possibly the names of the variables.</p>
<p>To create the full factorial design for an experiment with three factors with 3, 2, and 3 levels respectively the following code would be used:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">gen.factorial(c(3,2,3), 3, center=TRUE,
  varNames=c(&quot;F1&quot;, &quot;F2&quot;, &quot;F3&quot;))</pre></div></div>

<p>The <strong>center</strong> option makes the level settings symmetric which is a common way of representing the design. The full design is:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">   F1 F2 F3
1  -1 -1 -1
2   0 -1 -1
3   1 -1 -1
4  -1  1 -1
5   0  1 -1
6   1  1 -1
7  -1 -1  0
8   0 -1  0
9   1 -1  0
10 -1  1  0
11  0  1  0
12  1  1  0
13 -1 -1  1
14  0 -1  1
15  1 -1  1
16 -1  1  1
17  0  1  1
18  1  1  1</pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/design-of-experiments-full-factorial/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Design of Experiments &#8211; Optimal Designs</title>
		<link>http://www.wekaleamstudios.co.uk/posts/design-of-experiments-optimal-designs/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/design-of-experiments-optimal-designs/#comments</comments>
		<pubDate>Sun, 29 Nov 2009 22:28:03 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Design of Experiments]]></category>
		<category><![CDATA[AlgDesign]]></category>
		<category><![CDATA[Candidate List]]></category>
		<category><![CDATA[D Optimality]]></category>
		<category><![CDATA[expand.grid]]></category>
		<category><![CDATA[factor]]></category>
		<category><![CDATA[Federov]]></category>
		<category><![CDATA[optFederov]]></category>
		<category><![CDATA[Optimal Designs]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=610</guid>
		<description><![CDATA[When designing an experiment it is not always possible to generate a regular, balanced design such as a full or fractional factorial design plan. There are usually restrictions of the total number of experiments that can be undertaken or constraints on the factor settings both individually or in combination with each other. In these scenarios [...]]]></description>
			<content:encoded><![CDATA[<p>When designing an experiment it is not always possible to generate a regular, balanced design such as a full or fractional factorial design plan. There are usually restrictions of the total number of experiments that can be undertaken or constraints on the factor settings both individually or in combination with each other.<span id="more-610"></span></p>
<p>In these scenarios computer generated designs, the optimal designs of a given size, can be identified from a list of candidate factor combinations. The library <strong>AlgDesign</strong> in <strong>R</strong> has facilities for optimal design searches based on the Federov exchange algorithm. An optimality criterion has to be selected by the investigator, currently <strong>D</strong>, <strong>A</strong> or <strong>I</strong>, and this criterion is minimise by searching for an optimal subset of a given size from the candidate design list.</p>
<p>Given the total number of treatment runs for an experiment and a specified model, the computer algorithm chooses the optimal set of design runs from a candidate set of possible design treatment runs. This candidate set of treatment runs usually consists of all possible combinations of various factor levels that one wishes to use in the experiment.</p>
<p>First stage, as always, is to make the package available for use:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">library(AlgDesign)</pre></div></div>

<p>For illustrative purposes consider a four factor experiment, where the factors have 4, 3, 2, and 2 levels each respectively. Using the <strong>expand.grid</strong> function we can create a data frame of all possible combinations of the factor settings:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">cand.list = expand.grid(Factor1 = c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;, &quot;D&quot;),
  Factor2 = c(&quot;I&quot;, &quot;II&quot;, &quot;III&quot;),
  Factor3 = c(&quot;Low&quot;, &quot;High&quot;),
  Factor4 = c(&quot;Yes&quot;, &quot;No&quot;))</pre></div></div>

<p>The random number seed is set so that the algorithm can run:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">set.seed(69)</pre></div></div>

<p>The function <strong>optFederov</strong> <em>calculates an exact or approximate algorithmic design for one of three criteria, using Federov&#8217;s exchange algorithm</em>. The first argument to the function is a formula for the intended model for the data and the <strong>data</strong> argument specifies the list of candidate points:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">optFederov( ~ ., data = cand.list, nTrials = 13)</pre></div></div>

<p>In this example all of the factors in the candidate list appear in the model with a linear term. Quadratic or cubic terms can be included in this formula. The argument <strong>nTrials</strong> specifies the number of design points to select from the candidate list. The output from this function is:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">$D
[1] 0.226687
&nbsp;
$A
[1] 7.022811
&nbsp;
$Ge
[1] 0.718
&nbsp;
$Dea
[1] 0.676
&nbsp;
$design
   Factor1 Factor2 Factor3 Factor4
3        C       I     Low     Yes
6        B      II     Low     Yes
12       D     III     Low     Yes
16       D       I    High     Yes
19       C      II    High     Yes
21       A     III    High     Yes
25       A       I     Low      No
26       B       I     Low      No
29       A      II     Low      No
35       C     III     Low      No
39       C       I    High      No
44       D      II    High      No
46       B     III    High      No
&nbsp;
$rows
 [1]  3  6 12 16 19 21 25 26 29 35 39 44 46</pre></div></div>

<p>This provides details of the values of the optimality criteria for the design points selected from the candidate list, the row numbers and the levels for the factors for the chosen design points.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/design-of-experiments-optimal-designs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Design of Experiments &#8211; Power Calculations</title>
		<link>http://www.wekaleamstudios.co.uk/posts/design-of-experiments-power-calculations/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/design-of-experiments-power-calculations/#comments</comments>
		<pubDate>Wed, 18 Nov 2009 21:36:00 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Design of Experiments]]></category>
		<category><![CDATA[alpha]]></category>
		<category><![CDATA[beta]]></category>
		<category><![CDATA[delta]]></category>
		<category><![CDATA[one sample]]></category>
		<category><![CDATA[power]]></category>
		<category><![CDATA[power.t.test]]></category>
		<category><![CDATA[sample size]]></category>
		<category><![CDATA[significance]]></category>
		<category><![CDATA[t-test]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=591</guid>
		<description><![CDATA[Prior to conducting an experiment researchers will often undertake power calculations to determine the sample size required in their work to detect a meaningful scientific effect with sufficient power. In R there are functions to calculate either a minimum sample size for a specific power for a test or the power of a test for [...]]]></description>
			<content:encoded><![CDATA[<p>Prior to conducting an experiment researchers will often undertake power calculations to determine the sample size required in their work to detect a meaningful scientific effect with sufficient power. In <strong>R</strong> there are functions to calculate either a minimum sample size for a specific power for a test or the power of a test for a fixed sample size.<span id="more-591"></span></p>
<p>When undertaking sample size or power calculations for a prospective trial or experiment we need to consider various factors. There are two main probabilities of interest that are tied up with calculating a minimum sample size or the power of a specific test, and these are:</p>
<ul>
<li>Type I Error: The probability that the test accepts the null hypothesis, H_0, given that the null hypothesis is actually true. This quantity is often referred to as alpha.</li>
<li>Type II Error: The probability that the test rejects the null hypothesis, H_0, given that the null hypothesis is not true. This quantity is often referred to as beta.</li>
</ul>
<p>A decision needs to be made about what difference between the two groups being compared should be considered as corresponding to a meaningful difference. This difference is usually denoted by delta.</p>
<p>The base package has functions for calculating power or sample sizes, which includes the functions <strong>power.t.test</strong>, <strong>power.prop.test</strong> and <strong>power.anova.test</strong> for various common scenarios.</p>
<p>Consider a scenario where we might be buying batteries for a GPS device and the average battery life that we want to have is 400 minutes. If we decided that the performance is not acceptable if the average is more than 10 minutes (delta) lower than this (390 minutes) then we can calculate the number of batteries to test:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">power.t.test(delta = 10, sd = 6, power = 0.95, type = &quot;one.sample&quot;,
  alternative = &quot;one.sided&quot;)</pre></div></div>

<p>For this example we have assumed a standard deviation of 6 minutes for batteries (would either be assumed or estimated from previous data) and that we want a power of 95% in the test. Power is defined as 1 &#8211; beta, the Type II error probability. The default option for this function is for 5% probability of alpha, a Type I error. The test will involve only one group so we are considering a one-sample t test and only a one sided alternative is relevant as we do not mind if the batteries perform better than required.</p>
<p>The output from this function call is as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">     One-sample t test power calculation 
&nbsp;
              n = 5.584552
          delta = 10
             sd = 6
      sig.level = 0.05
          power = 0.95
    alternative = one.sided</pre></div></div>

<p>So we would need to test at least 6 batteries to obtain the required power in the test based on the other parameters that have been used.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/design-of-experiments-power-calculations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

