Distributions
Distributions are either discrete or continuous.
Discrete distribution or discrete random variable has a finite number of possible values.
Bernoulli variable takes the value 1 with probability of p and the value 0 with probability of 1 -p. Bernoulli distribution can be used to represent binary experiments such as a coin toss.
Binomial variable is the sum of n independent Bernoulli variables with probability p. Binomial distribution can be used to model the number of successes in a specified number of identical binary experiments.
Multinomial distribution is generalization of binomial distribution. It's the probability of counts for rolling a
k
-sided dicen
times.Geometric variable counts the number of trials required to observe a single success. Geometric distribution can be used to model the number of times a die must be rolled in order for a six to be observed.
Poisson variable counts the number of events occur in a fixed time interval or space. Poisson distribution can be used to model events such as goals in a soccer match.
Categorical distribution describes the possible results of a random event that can take on one of
K
possible outcomes, with the probability of each outcome separately specified.Suppose a biased coin comes up heads with probability 0.3 when tossed. What is the probability of achieving 0 - 6 heads after six tosses?
p(0 heads) = 0.1176 p(1 heads) = 0.3026 p(2 heads) = 0.3241 p(3 heads) = 0.1852 p(4 heads) = 0.0595 p(5 heads) = 0.0102 p(5 heads) = 0.0007
And here where we can see it's a lot more common to get 1 or 2 heads, and it forms a binomial distribution.
Continuous distribution or continuous random variable has infinite number of possible values.
- Uniform distribution has all intervals of equal length have equal probability. Uniform distribution can be used to model people's full birth date where we assume that all times in the calendar year are equally likely.
- Gaussian (or Normal ) variables form a bell-shape to represent an event produced by many small unknown effects. Gaussian distribution can be used to model people's height, since height can be assumed to be the result of many small generic and environmental factors.
- T-distribution is used to estimate the mean of normally distributed population in situations where the sample size is small and population standard deviation is unknown. It also forms a bell-shape but always around 0.
- Chi-squared variable with k degrees of freedom is the sum of k independent and identically distributed squared standard normal random variables. Chi-squared distribution can be used in the construction of confidence intervals.
- Exponential variable is the continuous counterpart of geometric distribution. Exponential distribution can be used to model waiting times.
- F-distribution can be used in analysis of variance.
- Gamma distribution is the more general variant of exponential and chi-squared distributions.
- Beta distribution is the more general variant of continuous distributions bound between 0 and 1.
Central limit theorem states that the sample mean of a large number of random variables is approximately normally distributed. Remember that Gaussian (normal) distribution is used to estimate outcomes produced by many small unknown effects.
Source
- Wikipedia