Working with Probability Distributions

May 31st, 2009

Probability distributions have a central role in Statistics and the R software has functions to work with a large range of distributions – the syntax has been selected to provide some consistency based on the type of information required about a distribution.

There are four functions that are defined for each distribution that is available within R. These functions are:

  • The density function – name starts with a d.
  • The cumulative density function – name starts with a p.
  • The quantile function – name starts with a q.
  • Random number generation – name starts with a r.

Both discrete and continuous distributions are available in R. Distributions that we can access include: Beta, Binomial, Chi-squared, F, Logistic, Normal, Poisson, Student’s t and Weibull.

There is a base name for each of the distributions and we use the suffix letter mentioned above to access the requisite information. If we consider the Normal distribution as an example then dnorm is the function that will provide the density:

> dnorm(1.96, mean = 0, sd = 1)
[1] 0.05844094

The mean and sd arguments are used to specify a particular pair of parameters for the Normal distribution. The cumulative distribution function is pnorm and the syntax is very similar to the dnorm function:

> pnorm(1.96, mean = 0, sd = 1)
[1] 0.9750021

The default option is for the function to return the cumulative probability for values less than the specified figure. The quantile function, qnorm, allows us to work back from probabilities to values on the original data scale:

> qnorm(0.95, mean = 0, sd = 1)
[1] 1.644854

As with the previous functions we can definition the parameters of the distribution where required. The last option of interest is the function that allows us to generate random samples for a particular distribution, which in the case of the Normal distribution is rnorm. Here we specify the number of samples to be drawn from the distribution along with the parameters of the distribution:

> rnorm(n = 20, mean = 0, sd = 1)
 [1] -1.1322606 -2.8320170 -0.5768220  1.0569513  1.0824524  1.4925396 -0.3010086 -0.4345893  2.6813322
[10]  0.3774106  1.7226911  0.5922038  0.0770510  1.4015955 -0.9998051  0.1924921  0.7181194  1.0107967
[19]  1.3224979 -0.1511634

The previous command samples twenty observations from the standard Normal distribution.

Comments are closed.