<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Software for Exploratory Data Analysis and Statistical Modelling &#187; Hypothesis Testing</title>
	<atom:link href="http://www.wekaleamstudios.co.uk/topics/statistical-analysis/hypothesis-testing/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.wekaleamstudios.co.uk</link>
	<description>Statistical Modelling with R</description>
	<lastBuildDate>Wed, 01 Feb 2012 19:44:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Contingency Tables &#8211; Fisher&#8217;s Exact Test</title>
		<link>http://www.wekaleamstudios.co.uk/posts/contingency-tables-fishers-exact-test/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/contingency-tables-fishers-exact-test/#comments</comments>
		<pubDate>Sat, 06 Mar 2010 11:46:15 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Hypothesis Testing]]></category>
		<category><![CDATA[contingency table]]></category>
		<category><![CDATA[Fisher's Exact Test]]></category>
		<category><![CDATA[fisher.test]]></category>
		<category><![CDATA[p-value]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=825</guid>
		<description><![CDATA[A contingency table is used in statistics to provide a tabular summary of categorical data and the cells in the table are the number of occassions that a particular combination of variables occur together in a set of data. The relationship between variables in a contingency table are often investigated using Chi-squared tests. The simplest [...]]]></description>
			<content:encoded><![CDATA[<p>A contingency table is used in statistics to provide a tabular summary of categorical data and the cells in the table are the number of occassions that a particular combination of variables occur together in a set of data. The relationship between variables in a contingency table are often investigated using Chi-squared tests.<span id="more-825"></span></p>
<p>The simplest contingency table with two variables has two levels for each of the variables. Consider a trial comparing the performance of two challengers. Each of the challengers undertook the trial eight times and the number of successful trials was recorded. The hypothesis under investigation in this experiment is that the performance of the two challengers is similar. If the first challenger was only successful on one trial and the second challenger was successful on four of the eight trials then can we discriminate between their peformance?</p>
<p>The function <strong>fisher.test</strong> is used to perform Fisher&#8217;s exact test when the sample size is small to avoid using an approximation that is known to be unrealiable for sample samples. The data is setup in a matrix:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">challenge.df = matrix(c(1,4,7,4), nrow = 2)</pre></div></div>

<p>The function is then called using this data to produce the test summary information:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; fisher.test(challenge.df)
&nbsp;
        Fisher's Exact Test for Count Data
&nbsp;
data:  challenge.df 
p-value = 0.2821
alternative hypothesis: true odds ratio is not equal to 1 
95 percent confidence interval:
 0.002553456 2.416009239 
sample estimates:
odds ratio 
 0.1624254</pre></div></div>

<p>The p-value calculated for the test does not provide any evidence against the assumption of independence. In this example this means that we cannot confidently claim any difference in performance for the two challengers.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/contingency-tables-fishers-exact-test/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>One and Two Sample Hypothesis Testing</title>
		<link>http://www.wekaleamstudios.co.uk/posts/one-and-two-sample-hypothesis-testing/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/one-and-two-sample-hypothesis-testing/#comments</comments>
		<pubDate>Fri, 26 Jun 2009 17:45:09 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Classical Univariate Statistics]]></category>
		<category><![CDATA[Hypothesis Testing]]></category>
		<category><![CDATA[alternative hypothesis]]></category>
		<category><![CDATA[confidence interval]]></category>
		<category><![CDATA[null hypothesis]]></category>
		<category><![CDATA[one sample]]></category>
		<category><![CDATA[t-test]]></category>
		<category><![CDATA[t.test]]></category>
		<category><![CDATA[two samples]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=244</guid>
		<description><![CDATA[The t-test is regularly used in Classical Statistics to investigate one or two samples of data and to test a particular hypothesis. There are variants on the t-test that are all handled by the same function, t.test, in R. The simplest case is where there is a set of data and we are interested in [...]]]></description>
			<content:encoded><![CDATA[<p>The <strong>t-test</strong> is regularly used in Classical Statistics to investigate one or two samples of data and to test a particular hypothesis. There are variants on the <strong>t-test</strong> that are all handled by the same function, <strong>t.test</strong>, in <strong>R</strong>.<span id="more-244"></span></p>
<p>The simplest case is where there is a set of data and we are interested in testing whether the mean value of the data is equal to a particular value. The three possible alternative hypotheses of not equal, greater than or less than are all available via this function by setting the <strong>alternative</strong> argument. Consider the <strong>rock</strong> dataset, which is a series of 48 measurements on rock samples taken from a petroleum reservoir, from the base <strong>R</strong> system. If we wanted to test whether the mean perimeter is 2,500 pixels then we would perform a one sample <strong>t-test</strong>:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">t.test(rock$peri, mu = 2500)</pre></div></div>

<p>This produces the following output to the console:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&nbsp;
        One Sample t-test
&nbsp;
data:  rock$peri 
t = 0.8818, df = 47, p-value = 0.3824
alternative hypothesis: true mean is not equal to 2500 
95 percent confidence interval:
 2266.501 3097.923 
sample estimates:
mean of x 
 2682.212</pre></div></div>

<p>The <strong>t</strong> statistics can be seen in the output, 0.8818, and the p-value is not small so there is no evidence of a departure from 2500 pixels for the mean perimeter. By default a 95% confidence interval on the mean value of the data is also shown in the output.</p>
<p>If we have two samples of independent data, for example considering the commonly demonstrated olive oil fatty acid data, then a two sample <strong>t-test</strong> can be performed in <strong>R</strong>. The data can be divided by Area and we can test for significant differences in a given fatty acid between two Areas in the data set. To perform the test we would first check that the data is approximately Normally distributed, which is one of the assumptions underlying the test, and then consider whether the variances of the two groups of data are similar. The function <strong>var.test</strong> can be used to compare the variances:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; var(olive.df$oleic[olive.df$Area == &quot;East-Liguria&quot;])
[1] 24697.96
&gt; var(olive.df$oleic[olive.df$Area == &quot;North-Apulia&quot;])
[1] 24568.5
&gt; var.test(olive.df$oleic[olive.df$Area == &quot;East-Liguria&quot;], olive.df$oleic[olive.df$Area == &quot;North-Apulia&quot;])
&nbsp;
        F test to compare two variances
&nbsp;
data:  olive.df$oleic[olive.df$Area == &quot;East-Liguria&quot;] and olive.df$oleic[olive.df$Area == &quot;North-Apulia&quot;] 
F = 1.0053, num df = 49, denom df = 24, p-value = 0.9796
alternative hypothesis: true ratio of variances is not equal to 1 
95 percent confidence interval:
 0.4764333 1.9476684 
sample estimates:
ratio of variances 
          1.005269</pre></div></div>

<p>First up we calculated the variances for the East Liguria and North Apulia Areas and then we run the formal test for equal variance. The confidence interval on the ratio includes the value one so in this case we proceed under the assumption that the variances are equal for these Areas. To run the test:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&gt; t.test(olive.df$oleic[olive.df$Area == &quot;East-Liguria&quot;], olive.df$oleic[olive.df$Area == &quot;North-Apulia&quot;],
+ var.equal = TRUE)
&nbsp;
        Two Sample t-test
&nbsp;
data:  olive.df$oleic[olive.df$Area == &quot;East-Liguria&quot;] and olive.df$oleic[olive.df$Area == &quot;North-Apulia&quot;] 
t = -1.9344, df = 73, p-value = 0.05694
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -151.054614    2.254614 
sample estimates:
mean of x mean of y 
   7746.0    7820.4</pre></div></div>

<p>The argument <strong>var.equal</strong> is used to specify whether the variances for the two samples of data are equal.</p>
<p>This function is flexible and can be used for paired data via the <strong>paired</strong> argument.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/one-and-two-sample-hypothesis-testing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

