*Probability Theory*

*Probability Theory*

Probability theory is the branch of mathematics concerned with probability.

**Random variable** is a function that assigns a real number to each outcome in the probability space.

**Expectation E** of a random variable is a number that attempts to capture the center of that random variable's distribution. For example, rolling a fair die has the expectation of 3.5.

```
E[X] = sum_x∈X( x * P(x) )
chance = 1 / 6 = 0.166...
(1*chance + 2*chance + 3*chance + 4*chance + 5*chance + 6*chance) = expectation
(0.166... + 0.333... + 0.500... + 0.666... + 0.833... + 1.0) = 3.5
```

**Correlation** is a measure of the linear relationship between two variables.

**Variance Var** of a random variable quantifies the spread of that random variable's distribution. Variance is calculated by averaging value of the squared difference between the random variable and its expectation. For example, rolling a fair die has the variance of about 2.92.

```
variance(X) = average[ (X - expectation[X])^2 ]
(1 - 3.5)^2 = 6.25
(2 - 3.5)^2 = 2.25
(3 - 3.5)^2 = 0.25
(4 - 3.5)^2 = 0.25
(5 - 3.5)^2 = 2.25
(6 - 3.5)^2 = 6.25
(6.25 + 2.25 + 0.25 + 0.25 + 2.25 + 6.25) / 6 = 2.916666666666666...
```

**Analysis of variance** tests whether groups of data have the same mean. ANOVA tables usually have the following values:

- Sum of squared errors (SSE) is related to the total amount of error.
- Degrees of freedom (df) is related to the number of data points.
- Mean square error (MSE) is SSE / df.
- Test statistic (F).
- F statistic has a parameter p.

**Permutations** are ordered sequences. If you draw 4 cards from a 4 card deck, there are 24 different sequences those cards may be revealed.

**Combinations** are unordered sets. If you draw 4 cards from a 4 card deck, there is just one set that can be achieved; all of them.

**Ordinary least squares** focuses in determining the linear model that minimizes the sum of squared errors between observations and predictions.

**Conditional probability** allows shrinking our sample space to a particular event. For example, we might expect the probability that it will rain tomorrow (in general) to be smaller than the probability it will rain tomorrow given that it is cloudy today.

**Frequentist inference** is the process of determining properties of an underlying distribution via the observation of data.

**Estimator** is any function of randomly sampled observations. Estimators are desired to be unbiased and consistent.

**Confidence interval** estimates a parameter by specifying a range of possible values. Such and interval is associated with a confidence level, which is the probability that the procedure used to generate the interval will produce an interval containing the true parameter.

**Bayesian inference** specifies how one should update one's beliefs upon observing data. It all revolves around Bayes' Theorem.

```
You decide to get tested for a rare disease, it comes out as positive.
Given the test result, what is the probability that I actually have this disease?
1. 2. 3. 4.
P(sick|+) = (P(+|sick) * P(sick)) / P(+)
1. probability of actually being sick after being diagnosed positive
2. probability of being diagnosed positive after being sick
3. probability of being sick
4. probability of being diagnosed positive
```