# Statistical Modelling

April 26th, 2009

The expression regression modelling is a general term used to describe the approach to fitting linear models to data, selecting the most appropriate model and using this model for interpretation and/or prediction of future data.

In general, these models will be defined in terms of a response variable, which is the main measure of interest, and a series of explanatory variables that are used to describe the variability in the response variable. The model that describes the relationship between these variables is a mathematical function and a statistical technique is used to estimate the most likely parameters for this function based on the data that is available.

The Linear Model is the most frequently used technique in classical statistics and has applications in a wide range of disciplines. These models are useful because the interpretation of the relationship between the response and explanatory variables is relatively easy to describe and the model can also be used for making future predictions.

Linear Models can be extended in a variety of ways to handle more complicated relationships and different data generation processes. The Generalized Linear Model (GLM) methodology was introduced to work with data that is described by a wider range of distributions. GLM covers commonly used models such as Logistic Regression or Log-linear Models.

Irrespective of the model that is used for a particular set of data there is a reasonably general series of steps that need to be considered when building a model:

• Specify the model to be fitted to the set of data, which will be determine by the type of data (count, interval, continuous etc.) and the relationship between the factors and this response value.
• Fit the model to the data to estimate model coefficients, goodness of fit and other diagnostics to determine whether the assumptions underlying the model are reasonable for the given data.
• Undertake a variable selection exercise to identify the simplest possible model that provides a good description of the data.
• Use the model for interpretation and/or prediction of future observations.