<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Software for Exploratory Data Analysis and Statistical Modelling</title>
	<atom:link href="http://www.wekaleamstudios.co.uk/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.wekaleamstudios.co.uk</link>
	<description>Statistical Modelling with R</description>
	<lastBuildDate>Sun, 13 Jan 2013 10:03:02 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.2</generator>
		<item>
		<title>Seasonal Trend Decomposition in R</title>
		<link>http://www.wekaleamstudios.co.uk/posts/seasonal-trend-decomposition-in-r/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/seasonal-trend-decomposition-in-r/#comments</comments>
		<pubDate>Fri, 11 Jan 2013 07:40:29 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Data Summary]]></category>
		<category><![CDATA[Exploratory Data Analysis]]></category>
		<category><![CDATA[Graphs]]></category>
		<category><![CDATA[decomposition]]></category>
		<category><![CDATA[plot]]></category>
		<category><![CDATA[s.window]]></category>
		<category><![CDATA[seasonal]]></category>
		<category><![CDATA[stl]]></category>
		<category><![CDATA[time series]]></category>
		<category><![CDATA[trend]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1902</guid>
		<description><![CDATA[The Seasonal Trend Decomposition using Loess (STL) is an algorithm that was developed to help to divide up a time series into three components namely: the trend, seasonality and remainder. The methodology was presented by Robert Cleveland, William Cleveland, Jean McRae and Irma Terpenning in the Journal of Official Statistics in 1990. The STL is [...]]]></description>
			<content:encoded><![CDATA[<p>The Seasonal Trend Decomposition using Loess (STL) is an algorithm that was developed to help to divide up a time series into three components namely: the trend, seasonality and remainder. The methodology was presented by Robert Cleveland, William Cleveland, Jean McRae and Irma Terpenning in the Journal of Official Statistics in 1990. The STL is available within R via the <strong>stl</strong> function.<span id="more-1902"></span></p>
<p>The use of the <strong>stl</strong> function can be demonstrated using one of the data sets available within the base R installation. The well used <em>nottem</em> data set (Average Monthly Temperatures at Nottingham, 1920-1939) is a good starting point. The data itself is presented here:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">&gt; nottem
      Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
1920 40.6 40.8 44.4 46.7 54.1 58.5 57.7 56.4 54.3 50.5 42.9 39.8
1921 44.2 39.8 45.1 47.0 54.1 58.7 66.3 59.9 57.0 54.2 39.7 42.8
1922 37.5 38.7 39.5 42.1 55.7 57.8 56.8 54.3 54.3 47.1 41.8 41.7
1923 41.8 40.1 42.9 45.8 49.2 52.7 64.2 59.6 54.4 49.2 36.3 37.6
1924 39.3 37.5 38.3 45.5 53.2 57.7 60.8 58.2 56.4 49.8 44.4 43.6
1925 40.0 40.5 40.8 45.1 53.8 59.4 63.5 61.0 53.0 50.0 38.1 36.3
1926 39.2 43.4 43.4 48.9 50.6 56.8 62.5 62.0 57.5 46.7 41.6 39.8
1927 39.4 38.5 45.3 47.1 51.7 55.0 60.4 60.5 54.7 50.3 42.3 35.2
1928 40.8 41.1 42.8 47.3 50.9 56.4 62.2 60.5 55.4 50.2 43.0 37.3
1929 34.8 31.3 41.0 43.9 53.1 56.9 62.5 60.3 59.8 49.2 42.9 41.9
1930 41.6 37.1 41.2 46.9 51.2 60.4 60.1 61.6 57.0 50.9 43.0 38.8
1931 37.1 38.4 38.4 46.5 53.5 58.4 60.6 58.2 53.8 46.6 45.5 40.6
1932 42.4 38.4 40.3 44.6 50.9 57.0 62.1 63.5 56.3 47.3 43.6 41.8
1933 36.2 39.3 44.5 48.7 54.2 60.8 65.5 64.9 60.1 50.2 42.1 35.8
1934 39.4 38.2 40.4 46.9 53.4 59.6 66.5 60.4 59.2 51.2 42.8 45.8
1935 40.0 42.6 43.5 47.1 50.0 60.5 64.6 64.0 56.8 48.6 44.2 36.4
1936 37.3 35.0 44.0 43.9 52.7 58.6 60.0 61.1 58.1 49.6 41.6 41.3
1937 40.8 41.0 38.4 47.4 54.1 58.6 61.4 61.8 56.3 50.9 41.4 37.1
1938 42.1 41.2 47.3 46.6 52.4 59.0 59.6 60.4 57.0 50.7 47.8 39.2
1939 39.4 40.9 42.4 47.8 52.4 58.0 60.7 61.8 58.2 46.7 46.6 37.8</pre></td></tr></table></div>

<p>We can try and run <strong>stl</strong> by specifying the data frame only but <strong>R</strong> returns an error message:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">&gt; stl(nottem)
Error in stl(nottem) : argument &quot;s.window&quot; is missing, with no default</pre></td></tr></table></div>

<p>Looking at the help pages we see the following information for the <em>s.window</em> argument: <em>either the character string &#8220;periodic&#8221; or the span (in lags) of the loess window for seasonal extraction, which should be odd.</em> so if we work with the <em>periodic</em> option we now find that R runs happily:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">&gt; nottem.stl = stl(nottem, s.window=&quot;periodic&quot;)</pre></td></tr></table></div>

<p>Now that we have the STL decomposition there is a plot function provided for the object created from a call to <strong>stl</strong>.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">&gt; plot(nottem.stl)</pre></td></tr></table></div>

<p>The graph looks like this:</p>
<div id="attachment_1956" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2013/01/nottem-stl1.jpeg"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2013/01/nottem-stl1-300x300.jpeg" alt="STL Decomposition of Nottingham Temperature Time Series" title="STL Decomposition of Nottingham Temperature Time Series" width="300" height="300" class="size-medium wp-image-1956" /></a><p class="wp-caption-text">STL Decomposition of Nottingham Temperature Time Series</p></div>
<p>The four graphs are the original data, seasonal component, trend component and the remainder and this shows the periodic seasonal pattern extracted out from the original data and the trend that moves around between 47 and 51 degrees Fahrenheit. There is a bar at the right hand side of each graph to allow a relative comparison of the magnitudes of each component. For this data the change in trend is less than the variation doing to the monthly variation.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/seasonal-trend-decomposition-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Graph Types: Pie Charts</title>
		<link>http://www.wekaleamstudios.co.uk/posts/graph-types-pie-charts/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/graph-types-pie-charts/#comments</comments>
		<pubDate>Sat, 13 Oct 2012 12:52:01 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Exploratory Data Analysis]]></category>
		<category><![CDATA[Graphs]]></category>
		<category><![CDATA[chart]]></category>
		<category><![CDATA[graph]]></category>
		<category><![CDATA[pie]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1884</guid>
		<description><![CDATA[The pie chart is a frequently seen graph that uses area to compare percentages for a set of categories. Although this type of graph is based on comparing single metric for each category the display is two dimensional but sometimes even appears in three dimensions. Strengths: Other than familiarity with this type of display it [...]]]></description>
			<content:encoded><![CDATA[<p>The pie chart is a frequently seen graph that uses area to compare percentages for a set of categories. Although this type of graph is based on comparing single metric for each category the display is two dimensional but sometimes even appears in three dimensions.<span id="more-1884"></span></p>
<div id="attachment_1886" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2012/07/piechart.png"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2012/07/piechart-300x239.png" alt="Pie Chart Example" title="Pie Chart Example" width="300" height="239" class="size-medium wp-image-1886" /></a><p class="wp-caption-text">An illustration of a typical pie chart style that is frequently seen in reports or presentations.</p></div>
<p><em>Strengths:</em></p>
<p>Other than familiarity with this type of display it is difficult to identify a situation where a pie chart would be a good way to display data. Pie Charts are also available in a wide range of software packages, although that isn&#8217;t really a good reason to recommend them!</p>
<p><em>Weaknesses:</em></p>
<p>The help files in R for the pie function provide the following advice:</p>
<blockquote><p>Pie charts are a very bad way of displaying information. The eye is good at judging linear measures and bad at judging relative areas. A bar chart or dot chart is a preferable way of displaying this type of data.</p></blockquote>
<p>The main question is why would we use two dimensions (relative area further confused by angle of the wedges) to make a one dimensional comparison?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/graph-types-pie-charts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data from the ONS</title>
		<link>http://www.wekaleamstudios.co.uk/posts/data-from-the-ons/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/data-from-the-ons/#comments</comments>
		<pubDate>Sat, 15 Sep 2012 13:28:18 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Websites]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[ONS]]></category>
		<category><![CDATA[time series]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1963</guid>
		<description><![CDATA[The Office for National Statistics (ONS) in the UK makes various data sets publicly available on their website. This includes a number of time series that could be useful for learning about different statistical models for time series data.]]></description>
			<content:encoded><![CDATA[<p>The Office for National Statistics (ONS) in the UK makes various data sets publicly available on their <a href="http://www.ons.gov.uk/ons/datasets-and-tables/index.html">website</a>. This includes a number of time series that could be useful for learning about different statistical models for time series data.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/data-from-the-ons/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Linear Algebra Video Tutorials</title>
		<link>http://www.wekaleamstudios.co.uk/posts/linear-algebra-video-tutorials/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/linear-algebra-video-tutorials/#comments</comments>
		<pubDate>Sun, 19 Aug 2012 18:26:45 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Websites]]></category>
		<category><![CDATA[Linear Algebra]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[Video]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1918</guid>
		<description><![CDATA[The following website has some interesting introductory course videos on linear algebra from Gilbert Strang at MIT.]]></description>
			<content:encoded><![CDATA[<p>The following <a href="http://www.academicearth.org/courses/linear-algebra">website</a> has some interesting introductory course videos on linear algebra from Gilbert Strang at MIT.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/linear-algebra-video-tutorials/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>R Graphical Manual</title>
		<link>http://www.wekaleamstudios.co.uk/posts/r-graphical-manual/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/r-graphical-manual/#comments</comments>
		<pubDate>Thu, 05 Jul 2012 08:15:42 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Websites]]></category>
		<category><![CDATA[graph]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1909</guid>
		<description><![CDATA[The R Graphical Manual is worth checking out for ideas about graphing data.]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://rgm2.lab.nig.ac.jp/RGM2/images.php?show=all&#038;pageID=1">R Graphical Manual</a> is worth checking out for ideas about graphing data.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/r-graphical-manual/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Graph Design Principles</title>
		<link>http://www.wekaleamstudios.co.uk/posts/graph-design-principles/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/graph-design-principles/#comments</comments>
		<pubDate>Mon, 25 Jun 2012 07:46:42 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Exploratory Data Analysis]]></category>
		<category><![CDATA[Graphs]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[graph]]></category>
		<category><![CDATA[principles]]></category>
		<category><![CDATA[Tufte]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1875</guid>
		<description><![CDATA[There are a set of basic principles that hold true for the design of many graphs and various authors have their own preferences. One author who is prominent due to his good work in the area of data visualisation and presentation of evidence to support decision making is Edward Tufte. Edward Tufte has proposed a [...]]]></description>
			<content:encoded><![CDATA[<p>There are a set of basic principles that hold true for the design of many graphs and various authors have their own preferences. One author who is prominent due to his good work in the area of data visualisation and presentation of evidence to support decision making is Edward Tufte.<span id="more-1875"></span></p>
<p>Edward Tufte has proposed a set of design principles that should be kept in mind when evaluating the effectiveness of a visualisation method:</p>
<ul>
<li>Comparison: a graph should show comparisons and contrasts and highlight where differences occur.</li>
<li>Causality: the data presentation should help the thought process in identifying reasonable explanations of cause and effect.</li>
<li>Multivariate analysis: most data has more than one or two dimensions and these can be captured on a page of computer display.</li>
<li>Integration of evidence: words, numbers, images and diagrams should be integrated into a single whole.</li>
<li>Documentation: acknowledgement of data sources.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/graph-design-principles/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Logistic Regression and Bias Reduction</title>
		<link>http://www.wekaleamstudios.co.uk/posts/logistic-regression-and-bias-reduction/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/logistic-regression-and-bias-reduction/#comments</comments>
		<pubDate>Tue, 22 May 2012 17:43:05 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Statistical Modelling]]></category>
		<category><![CDATA[bias]]></category>
		<category><![CDATA[brglm]]></category>
		<category><![CDATA[Firth]]></category>
		<category><![CDATA[glm]]></category>
		<category><![CDATA[reduction]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1849</guid>
		<description><![CDATA[David Firth published a paper in 1993 on maximum likelihood estimation and the reduction of bias when using this approach. The research in this area appears to provide benefit for logistic regression in small data sets where there is complete of quasi separation. This approach has been implemented for Generalized Linear Models in the brglm [...]]]></description>
			<content:encoded><![CDATA[<p>David Firth published a paper in 1993 on maximum likelihood estimation and the reduction of bias when using this approach. The research in this area appears to provide benefit for logistic regression in small data sets where there is complete of quasi separation. This approach has been implemented for Generalized Linear Models in the <strong>brglm</strong> package.<span id="more-1849"></span></p>
<p><!--[Fast Tube]--><span id="JoKc61fTYPQ" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/logistic-regression-and-bias-reduction/#JoKc61fTYPQ"><img src="http://i.ytimg.com/vi/JoKc61fTYPQ/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The help file for the <strong>brglm</strong> package provides the following words/justification about the methodology:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">For estimation in binomial-response GLMs, the bias-reduction method is
an improvement over traditional maximum likelihood because:
• the bias-reduced estimator is second-order unbiased and has smaller
variance than the maximum likelihood estimator and
• the resultant estimates and their corresponding standard errors are
always finite while the maximum likelihood estimates can be infinite
(in situations where complete or quasi separation occurs).</pre></td></tr></table></div>

<p>The original reference for this work is <em>Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika 80, 27–38</em>.</p>
<p>We will consider two data sets to compare the parameter estimates for a simple logistic regression model with one explanatory variable. The first example is one where we would expect a similar answer and the second is based on separation and illustrates the differences between the parameter estimates for the <strong>glm</strong> and <strong>brglm</strong> functions in <strong>R</strong>.</p>
<p>The first data set is shown below:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">&gt; ex1 = data.frame(
+ X = c(10, 10, 10, 20, 20, 20, 30, 30, 30, 40, 40, 40),
+ Y = c(0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1)
+ )
&gt; xtabs( ~ X + Y, data = ex1)
    Y
X    0 1
  10 3 0
  20 2 1
  30 1 2
  40 0 3</pre></td></tr></table></div>

<p>The cross-tabulation shows that for the middle two values for the explanatory variable (X) we have a probability of 0.33 and 0.67. The logistic regression model fitted by glm to this data:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">&gt; m1a = glm(Y ~ X, data = ex1, family = binomial)
&gt; summary(m1a)
&nbsp;
Call:
glm(formula = Y ~ X, family = binomial, data = ex1)
&nbsp;
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.6877  -0.3734   0.0000   0.3734   1.6877  
&nbsp;
Coefficients:
            Estimate Std. Error z value Pr(&gt;|z|)  
(Intercept)  -5.7440     3.1622  -1.816   0.0693 .
X             0.2298     0.1214   1.892   0.0585 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
&nbsp;
(Dispersion parameter for binomial family taken to be 1)
&nbsp;
    Null deviance: 16.636  on 11  degrees of freedom
Residual deviance:  8.276  on 10  degrees of freedom
AIC: 12.276
&nbsp;
Number of Fisher Scoring iterations: 5</pre></td></tr></table></div>

<p>The biased reduced version of this model:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">&gt; require(brglm)
Loading required package: brglm
Loading required package: profileModel
&nbsp;
&gt; m1b = brglm(Y ~ X, data = ex1, family = binomial)
&gt; summary(m1b)
&nbsp;
Call:
brglm(formula = Y ~ X, family = binomial, data = ex1)
&nbsp;
&nbsp;
Coefficients:
            Estimate Std. Error z value Pr(&gt;|z|)  
(Intercept) -3.90201    2.23945  -1.742   0.0814 .
X            0.15608    0.08439   1.849   0.0644 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
&nbsp;
(Dispersion parameter for binomial family taken to be 1)
&nbsp;
    Null deviance: 12.2045  on 11  degrees of freedom
Residual deviance:  8.7505  on 10  degrees of freedom
Penalized deviance: 3.23312 
AIC:  12.751</pre></td></tr></table></div>

<p>In both cases the model has similar slope estimates for the explanatory variable and finite standard errors.</p>
<p>The next block of codes creates a graph to compare the two model fitting procedures.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">&gt; ex1a = data.frame(
+ Model = &quot;GLM&quot;,
+ X = seq(0, 50),
+ Y = predict(m1a, newdata = data.frame(X = seq(0, 50)), type = &quot;response&quot;)
+ )
&nbsp;
&gt; ex1b = data.frame(
+ Model = &quot;BRGLM&quot;,
+ X = seq(0, 50),
+ Y = predict(m1b, newdata = data.frame(X = seq(0, 50)), type = &quot;response&quot;)
+ )
&nbsp;
&gt; ex1c = rbind(ex1a, ex1b)
&nbsp;
&gt; require(ggplot2)
Loading required package: ggplot2
&nbsp;
&gt; ggplot(ex1c, aes(X, Y, colour = Model)) + geom_line()</pre></td></tr></table></div>

<p>The fitted model curves are shown in the figure below which highlights the difference in slopes where the biased reduced model has a shallow curve compared to the model fitted by the <strong>glm</strong> function.</p>
<div id="attachment_1854" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2012/05/brlr1.png"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2012/05/brlr1-300x300.png" alt="" title="Bias Reduced Logistic Regression Example 1" width="300" height="300" class="size-medium wp-image-1854" /></a><p class="wp-caption-text">Bias Reduced Logistic Regression Example 1</p></div>
<p>The second example is one where there is separation in the data &#8211; the probabilities of success are either 0 or 1 at all four values of the explanatory variable.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">&gt; ex2 = data.frame(
+ X = c(10, 10, 10, 20, 20, 20, 30, 30, 30, 40, 40, 40),
+ Y = c(0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1)
+ )
&gt; 
&gt; xtabs( ~ X + Y, data = ex2)
    Y
X    0 1
  10 3 0
  20 3 0
  30 0 3
  40 0 3</pre></td></tr></table></div>

<p>The <strong>glm</strong> function struggles to produce a sensible fit to the data and the standard errors for the model parameters are very large.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">&gt; m2a = glm(Y ~ X, data = ex2, family = binomial)
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred 
&gt; summary(m2a)
&nbsp;
Call:
glm(formula = Y ~ X, family = binomial, data = ex2)
&nbsp;
Deviance Residuals: 
       Min          1Q      Median          3Q         Max  
-6.326e-06  -1.597e-06   0.000e+00   1.597e-06   6.326e-06  
&nbsp;
Coefficients:
              Estimate Std. Error z value Pr(&gt;|z|)
(Intercept)   -123.174 282264.199       0        1
X                4.927  11071.305       0        1
&nbsp;
(Dispersion parameter for binomial family taken to be 1)
&nbsp;
    Null deviance: 1.6636e+01  on 11  degrees of freedom
Residual deviance: 2.4010e-10  on 10  degrees of freedom
AIC: 4
&nbsp;
Number of Fisher Scoring iterations: 25</pre></td></tr></table></div>

<p>The bias reduced version provides finite estimate of the model parameters:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">&gt; m2b = brglm(Y ~ X, data = ex2, family = binomial)
&gt; summary(m2b)
&nbsp;
Call:
brglm(formula = Y ~ X, family = binomial, data = ex2)
&nbsp;
&nbsp;
Coefficients:
            Estimate Std. Error z value Pr(&gt;|z|)  
(Intercept)  -8.2336     4.7028  -1.751    0.080 .
X             0.3293     0.1831   1.799    0.072 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
&nbsp;
(Dispersion parameter for binomial family taken to be 1)
&nbsp;
    Null deviance: 12.55  on 11  degrees of freedom
Residual deviance:  2.20  on 10  degrees of freedom
Penalized deviance: -1.03921 
AIC:  6.2</pre></td></tr></table></div>

<p>In a similar fashion we can compare the fitted model in both cases:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">&gt; ex2a = data.frame(
+ Model = &quot;GLM&quot;,
+ X = seq(0, 50),
+ Y = predict(m2a, newdata = data.frame(X = seq(0, 50)), type = &quot;response&quot;)
+ )
&nbsp;
&gt; ex2b = data.frame(
+ Model = &quot;BRGLM&quot;,
+ X = seq(0, 50),
+ Y = predict(m2b, newdata = data.frame(X = seq(0, 50)), type = &quot;response&quot;)
+ )
&nbsp;
&gt; ex2c = rbind(ex2a, ex2b)
&nbsp;
ggplot(ex2c, aes(X, Y, colour = Model)) + geom_line()</pre></td></tr></table></div>

<p>The figure below is the comparison of the two fitting methods and we can see that the glm function produces a model that is not far off a step function which is unsatisfactory description of the data.</p>
<div id="attachment_1855" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.wekaleamstudios.co.uk/wp-content/uploads/2012/05/brlr2.png"><img src="http://www.wekaleamstudios.co.uk/wp-content/uploads/2012/05/brlr2-300x300.png" alt="" title="Bias Reduced Logistic Regression Example 2" width="300" height="300" class="size-medium wp-image-1855" /></a><p class="wp-caption-text">Bias Reduced Logistic Regression Example 2</p></div>
<p>Other useful resources are provided on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/logistic-regression-and-bias-reduction/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Theme Elements in ggplot2</title>
		<link>http://www.wekaleamstudios.co.uk/posts/theme-elements-in-ggplot2/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/theme-elements-in-ggplot2/#comments</comments>
		<pubDate>Thu, 03 May 2012 19:43:29 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Grammar of Graphics]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[theme]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1831</guid>
		<description><![CDATA[This website provides a simple summary of the theme elements that can be set within ggplot2. There should be sufficient information here to change the default settings for graphs within the ggplot2 package.]]></description>
			<content:encoded><![CDATA[<p>This <a href="http://sape.inf.usi.ch/quick-reference/ggplot2/themes">website</a> provides a simple summary of the theme elements that can be set within <a href="http://had.co.nz/ggplot2/">ggplot2</a>. There should be sufficient information here to change the default settings for graphs within the ggplot2 package.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/theme-elements-in-ggplot2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Melt</title>
		<link>http://www.wekaleamstudios.co.uk/posts/melt/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/melt/#comments</comments>
		<pubDate>Thu, 05 Apr 2012 09:42:07 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Data Manipulation]]></category>
		<category><![CDATA[data frame]]></category>
		<category><![CDATA[melt]]></category>
		<category><![CDATA[reshape]]></category>
		<category><![CDATA[reshape2]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1807</guid>
		<description><![CDATA[There are many situations where data is presented in a format that is not ready to dive straight to exploratory data analysis or to use a desired statistical method. The reshape2 package for R provides useful functionality to avoid having to hack data around in a spreadsheet prior to import into R. The melt function [...]]]></description>
			<content:encoded><![CDATA[<p>There are many situations where data is presented in a format that is not ready to dive straight to exploratory data analysis or to use a desired statistical method. The <strong>reshape2</strong> package for <strong>R</strong> provides useful functionality to avoid having to hack data around in a spreadsheet prior to import into <strong>R</strong>.<span id="more-1807"></span></p>
<p>The <strong>melt</strong> function takes data in wide format and stacks a set of columns into a single column of data. To make use of the function we need to specify a data frame, the id variables (which will be left at their settings) and the measured variables (columns of data) to be stacked. The default assumption on measured variables is that it is all columns that are not specified as id variables.</p>
<p>Consider the following set of data:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">&gt; dat
  FactorA FactorB     Group1     Group2     Group3      Group4
1     Low     Low -1.1616334 -0.5228371 -0.6587093  0.45064563
2  Medium     Low -0.5991478 -1.0461138 -0.1942979  2.47985577
3    High     Low  0.8420797 -1.5413266  0.6318852 -0.98948125
4     Low  Medium  1.6225569 -1.2706469 -0.8026467 -0.32332181
5  Medium  Medium -0.3450745 -1.3377985  1.4988363  0.36541918
6    High  Medium  1.6025044  0.7631882 -0.5375833  0.85028148
7     Low    High -1.2991011 -0.2223622 -0.6321478 -1.57284216
8  Medium    High -0.4906400 -1.1802192  0.1235253  0.09891793
9    High    High  0.3897769 -0.3832142  0.6671101  0.23407257</pre></td></tr></table></div>

<p>There four groups are to used as part of a statistical analysis so we want to stack them into a single column and create an factor variable to indicate which group the measurement corresponds to and the <strong>melt</strong> function does the trick:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">&gt; melt(dat)
Using FactorA, FactorB as id variables
   FactorA FactorB variable       value
1      Low     Low   Group1 -1.16163338
2   Medium     Low   Group1 -0.59914783
3     High     Low   Group1  0.84207974
4      Low  Medium   Group1  1.62255690
5   Medium  Medium   Group1 -0.34507455
6     High  Medium   Group1  1.60250438
&nbsp;
...
36    High    High   Group4  0.23407257</pre></td></tr></table></div>

<p>Consider a second set of data where there are two groups but we only want to retain the FactorB variable in the molten data set:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">   FactorA   FactorB   Group1   Group2
1      Low  Very Low 6.851828 3.061329
2   Medium  Very Low 7.352169 1.303077
3     High  Very Low 6.918091 2.477875
4      Low       Low 7.402351 2.450527
5   Medium       Low 6.928385 4.334323
6     High       Low 7.400626 3.074158
7      Low    Medium 8.312145 5.725185
8   Medium    Medium 8.251806 4.384492
9     High    Medium 8.339398 3.443789
10     Low      High 5.127386 2.868952
11  Medium      High 8.561181 3.616898
12    High      High 6.993838 3.450634
13     Low Very High 7.880877 2.950622
14  Medium Very High 9.439892 3.220295
15    High Very High 8.799447 3.106060</pre></td></tr></table></div>

<p>We now need to specify both the <strong>id.vars</strong> and <strong>measure.vars</strong> arguments in the <strong>melt</strong> function to get the desired output:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="text" style="font-family:monospace;">&gt; melt(dat, id.vars = &quot;FactorB&quot;, measure.vars = c(&quot;Group1&quot;, &quot;Group2&quot;))
     FactorB variable    value
1   Very Low   Group1 6.851828
2   Very Low   Group1 7.352169
3   Very Low   Group1 6.918091
4        Low   Group1 7.402351
5        Low   Group1 6.928385
6        Low   Group1 7.400626
...
30 Very High   Group2 3.106060</pre></td></tr></table></div>

<p>Other useful resources are provided on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/melt/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tikz Paths and Decorations</title>
		<link>http://www.wekaleamstudios.co.uk/posts/tikz-paths-and-decorations/</link>
		<comments>http://www.wekaleamstudios.co.uk/posts/tikz-paths-and-decorations/#comments</comments>
		<pubDate>Sat, 17 Mar 2012 07:02:11 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[tikz/pgf]]></category>
		<category><![CDATA[decorations]]></category>
		<category><![CDATA[inner sep]]></category>
		<category><![CDATA[paths]]></category>
		<category><![CDATA[tikz]]></category>
		<category><![CDATA[ultra thick]]></category>
		<category><![CDATA[very thin]]></category>
		<category><![CDATA[\draw]]></category>

		<guid isPermaLink="false">http://www.wekaleamstudios.co.uk/?p=1721</guid>
		<description><![CDATA[The tikz drawing system can be used to draw basic solid black lines and simple shapes which can reasonably easily be adjusted to allow alternative patterns and decorations. Fast Tube by Casper The thickness of a line can be specified as part of a \draw command so when drawing a grid with thinner than standard [...]]]></description>
			<content:encoded><![CDATA[<p>The <strong>tikz</strong> drawing system can be used to draw basic solid black lines and simple shapes which can reasonably easily be adjusted to allow alternative patterns and decorations.<span id="more-1721"></span></p>
<p><!--[Fast Tube]--><span id="lIDtt0VooY4" style="display:block;"><a title="Click here to watch this video!" href="http://www.wekaleamstudios.co.uk/posts/tikz-paths-and-decorations/#lIDtt0VooY4"><img src="http://i.ytimg.com/vi/lIDtt0VooY4/0.jpg" alt="Fast Tube" border="0" width="320" height="240" /></a><br /><small>Fast Tube by <a title="Casper's Blog" href="http://blog.caspie.net/">Casper</a></small></span><!--[/Fast Tube]--></p>
<p>The thickness of a line can be specified as part of a <strong>\draw</strong> command so when drawing a grid with thinner than standard lines we could add the <strong>very thin</strong> option to the <strong>tikz</strong> code:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="latex" style="font-family:monospace;"><span style="color: #800000; font-weight: normal;">\draw</span><span style="color: #E02020; ">[</span><span style="color: #C08020; font-weight: normal;">step=.5cm,very thin,black!20</span><span style="color: #E02020; ">]</span> (0,0) grid (6,6);</pre></td></tr></table></div>

<p>The line pattern can also be specified as part of the draw options and the available patterns include <strong>solid</strong> lines, <strong>dashed</strong> lines and <strong>dotted</strong> lines:</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="latex" style="font-family:monospace;"><span style="color: #800000; font-weight: normal;">\draw</span><span style="color: #E02020; ">[</span><span style="color: #C08020; font-weight: normal;">solid,-&gt;</span><span style="color: #E02020; ">]</span> (0,0) -- (6,0);
<span style="color: #800000; font-weight: normal;">\draw</span><span style="color: #E02020; ">[</span><span style="color: #C08020; font-weight: normal;">dashed,&lt;-</span><span style="color: #E02020; ">]</span> (0,1) -- (6,1);
<span style="color: #800000; font-weight: normal;">\draw</span><span style="color: #E02020; ">[</span><span style="color: #C08020; font-weight: normal;">dotted,&lt;-&gt;</span><span style="color: #E02020; ">]</span> (0,2) -- (6,2);
<span style="color: #800000; font-weight: normal;">\draw</span><span style="color: #E02020; ">[</span><span style="color: #E02020; "><span style="color: #C08020; font-weight: normal;">solid,|&lt;-&gt;|</span></span><span style="color: #E02020; ">]</span> (0,3) -- (6,3);</pre></td></tr></table></div>

<p>These settings can also be applied to decorations around nodes such as boxes. The command below draws a box with a blue outline filled with a background shade of red. The outline of the box around the node is thicker than usual (<strong>ultra thick</strong>) and the box itself is rectangular with rounded corners. The last option <strong>inner sep</strong> adds some padding inside the box around the text.</p>

<div class="wp_syntax"><table><tr><td class="code"><pre class="latex" style="font-family:monospace;"><span style="color: #800000; font-weight: normal;">\node</span><span style="color: #E02020; ">[</span><span style="color: #C08020; font-weight: normal;">draw=blue,fill=red!40,ultra thick,rectangle,rounded corners,
  inner sep=10pt</span><span style="color: #E02020; ">]</span> (b1) <span style="color: #E02020; ">{</span><span style="color: #2020C0; font-weight: normal;">Example</span><span style="color: #E02020; ">}</span>;</pre></td></tr></table></div>

<p>Other useful resources are provided on the <a href="http://www.wekaleamstudios.co.uk/supplementary-material/">Supplementary Material</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wekaleamstudios.co.uk/posts/tikz-paths-and-decorations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
