### Part 1. Basic Discrete Probability

β¦

#### Conditional Probability Rules

- For any events with

- Multiplication Rule:

- Law for Total Probability: Suppose events form a partition of . Then for any event

- Bayesβ Theorem: Suppose events form a partition of . Then for any event

#### Independent Events

- are independent β

- NOT β are independent

Β

#### Conditional Independence

Suppose events are not necessarily independent, but there is another event such that

then we say that and are independent conditional on .

### Part 2. Random Variables

β¦

#### Expectation

Remark:

- Some distributions do not have expectation, for example, the Cauchy Distribution (fat tails).

- If are random variables, and are constants, then:
- if and are finite
- if and are independent.

- Jensen inequality:
- If is convex on the support of the random variable , then
- If is concave on the support of the random variable , then

#### Rule for the Lazy Statistician

Discrete version:

this is a theorem not a definition of

Continuous version:

Joint distribution version:

#### Bernoulli() Distribution

#### Binomial() Distribution

#### Geometric() Distribution

#### Poisson() Distribution

Remark:

- let , as n grows, converges to Poisson()

- if independent,

#### Uniform() Distribution

#### Pareto() Distribution

Β

Remark:

- . This is often used for modelling the tail of a distribution

#### Exponential() Distribution

Remark:

- Memoryless property: If I model the time between βeventsβ as exponential, the prob time to next event is greater than a units is the same, no matter how long it has been since the last event. Suppose , and . Then

#### Gamma() Distribution

For ,

where

Remark: if

- ,

- If , then

- For any , the random variable

- If are independent and , then

- The sum of n independent random variables has the distribution (Erlang Distribution).

#### Normal() Distribution

Remark:

- For and any ,

- Linear combinations of independent normal random variables are also normally distributed.

#### The Poisson Process

Poisson distribution can be derived from the limit of binomial(). Imagine divide the interval into subintervals. When the interval extended indefinitely, this leads to Poisson Process with rate .

βEvents occur as a Poisson Process with rate β means

- If equals the number of events during an interval of time of length , then has the Poisson() distribution

- The times between events are random variables with the distribution

- The times between events are independent random variables

- The numbers of events in disjoint time intervals are independent random variables.

- The waiting time for events has the Gamma() distribution. (sum of k distributions)

#### Beta() Distribution

Remark:

- If , then the distribution is symmetric about 0.5

#### Summary Table

### Part 3: Multivariate Distributions

#### Joint Probability Function

For discrete random variables

Joint PMF:

Β

Joint CDF

For continuous random variables

Joint PDF

- To calculate probabilities, integrate the pdf over the region of interest:
- A can be any shape

#### Marginal Distributions

For discrete case

For continuous case

Remark:

- for random variables

#### Conditional Distribution

for discrete case

for continuous case

with called the conditional desity of given .

Conditional Distribution V.S. Marginal Distribution

- for case I: are independent

- for case II: are weakly (positively) dependent

- for case III: are strongly (positively) dependent

are all the same for three cases. While are not the same, for example, as shown in the second plot.

#### Covariance & Correlation

properties:

- ,

- independent β . NOT β independent

- if β

Β

**More than two random variables**

Let denote two vectors of scalars, is a non-random matrix

#### Bivariate Normal Distribution

has the bivariate normal distribution if it has joint pdf

where is the vector of means and is the covariance matrix

Equally definition:

has the bivariate normal distribution iff all linear combinations of and are also normally distributed.

Properties:

- are normally distributed, i.e., the marginals are normal

- conditional distribution

- are independent.

- are independent β is bivariate normal

#### Multivariate Normal Distribution

has the multivariate normal distribution if it has joint pdf

where is positive definite (otherwise the inverse might not exist)

Properties:

- All marginal and conditional distributions are multivariate normal.

- Any random variable/vector of the form will be multivariate normal if is positive definite.

- If , then are independent.

- If is diagonal, are independent

### Part 4: Conditional Expectation

For discrete random , when conditioning on an event

when the event is an event concerning itself, the .

- E.g. ,

For continuous case,

#### Laws of total Probability

#### Prior and Posterior Distributions

In Bayesian analysis, before data is observed, the unknown parameter is modeled as a random variable having a probability distribution , called prior distribution. This distribution represents our prior belief about the value of this parameter. After observing data, we have increased our knowledge about the parameter . The equation is

#### Iterated Conditioning

where

Useful euqation:

proof:

#### Measure-Theoretic Notions of Conditional Expectation

βInformationβ can be captured via a collection of subsets of . Such collections are denoted using ,etc. Information means, we can tell elements in occur or not (instead of unsure).

When satisfies having properties, it is a -field or -algebra

- you can always know does not occur

- If , then
- If you know the occurrence / non-occurrence status of , then you also know status of

- If , then
- If you know the occurrence / non-occurrence status of , then you also know the status of

Β

Remark:

- is the set of subsets

- Trivial -field consists of only . Conditioning on is like conditioning on no information

- The power set is a -field, corresponding to βknowing everythingβ

- The information content from conditioning on a random variable is called the -field geerated by , denoted .
- written in simply probability theory can be interpreted as

- A random variable is said to be -measurable if

Β

**Properties of Conditional Expectation**

Assume are -field. are random variables.

- If is -measurable, then

- If is -measurable, then

- , for scalars

- If , then

- If is convex, then

Β

**Measure-Theoretic Independence**

- (-field) are independent iff for any and ,

- (random variables) are independent iff and are independent

### Part 5: Moment Generating Functions

**MGF of**

which is a function of . Can be calculated using the Rule for the Lazy Statistician.

**Calculate Moments**

#### Applications

**Uniqueness of MGF**

If for all , then and have the same distribution.

- Note that two random variables can have matching moments, i.e., but have different distributions.

Β

**Sum of Independent Random Variables**

Suppose are independent random variables, and . Then

for all

Β

**Establishing Convergence in Distribution**

if as for all for some , then

e.g.

for , then

thus

#### The Central Limit Theorem

Suppose are i.i.d. and and both exist and are finite, then

where and

Equally inferences, when n is large

#### The Delta Method

Assume are such that

where , is a constant, and satisfies and . Then, assuming is a function that is differentiable at and , then

This is often applied with and

Equivalent description:

if is approximately and as , then is approximately

assuming that is a function that is differentiable at and .

### Part 6: Classic Results

**Chi-squared distribution**

Chi-squared distribution with degrees of freedom is a special case of the Gamma distribution, which is . If follows such distribution

Β

**t-distribution**

t-distribution with degrees of freedom is defined as

where , is the Chi-squared distribution with freedom . And are independent.

t-distribution has a bell shape density, but has heavier tails than the normal distribution. As increases, the distribution converges to

When , the distribution is Cauchy distribution, which have very heavy tails. Its mean (expectation) does not exist.

- CLT do not work for Cauchy distribution

- are i.i.d. Cauchy β is also Cauchy distribution, no matter how large n is

Β

**Classic Results**

Assuming are i.i.d. , then

- and are independent

- The quantity

- The quantity

- The quantity

Note that

so is an unbiased estimator of .

Β

*End of Content*

Loading Comments...