## TOC

### 1. Binary Dependent Variables

A binary (dummy) dependent variable is a dependent variable that takes only two values: 0 or 1. By the definition of conditional expectation:

So when we estimate , we are estimating the probability that given . Suppose , then

thus, for binary dependent variables, the key question is how to model .

### 2. Linear Probability Model (LPM)

**Single Regressor**

The linear probability model (LPM) assumes that

This is the familiar simple linear regression model. The model for the data observed is

where . It is easy to verify that

Thus, the error term is conditionally

**heteroscedastic by definition**.**Example**

We are interested in whether race is a factor in denying a mortgage application. We have data of the mortgage application in the Boston area. An important determining variable is Payment-income Ratio (P/I).

The Linear Probability Model:

The population regression is

The estimation results are shown as below:

Interpretation: If increases by 0.1, the probability of denial increases by 0.604*0.1=0.0604, that is about 6.0 percentage points.

If we are interested in the effect of race on the probability of denial, holding constant the ratio, we can add the Race variable

Interpretation: African-American applicants have a 17.7% higher probability of having a mortgage application denied than a white, holding constant.

**LPM in the General Case**

The coefficient can be interpreted as the change in probability for a unit change of some regressor, holding other regressors fixed. Inference can be done based on White Standard Errors.

**Summary of LPM**

- Key feature: model as a linear function of

- Advantages:
- simple to estimate and to interpret
- the inference is the same as for linear multiple regression models but we need to use heteroskedasticity-robust standard errors

- Disadvantages:
- Predicted probabilities can be <0 or > 1
- It makes no sense that the probability should be linear in

### 3. Probit and Logit Models

#### 3.1 Models

We need a “translator” that takes a value from to and returns a value from 0 to 1 such that:

- The closer to the value from linear regression model is, the closer the predicted probability is.

- The closer to the value from linear regression model is, the closer the predicted probability is.

- No predicted probabilities are less than 0 or greater than 1.

In common practice, econometricians use TWO such “translators”:

- Probit (standard normal CDF)

- Logit (standard logistic CDF)

The differences between the two “translators” are small. In particular, there is no practical difference between the two “translators” if we only care predicted probabilities in the middle range of the data.

Both the Probit and Logit models have the same basic structure. Define a “Z-index” as . Use a non-linear S-shaped CDF-type function to transform into a predicted value between 0 and 1. The model is

- The probit model uses the standard normal CDF:

where is the standard normal CDF

- The logit model uses the logistic CDF:

where

#### 3.2 NLS & MLE

**Nonlinear least squares (NLS)**

Model is

estimators are given by

The NLS estimators are consistent and asymptotically normally distributed. But they are inefficient.

**Maximum Likelihood Estimation (MLE)**

The probability that conditional on is . The conditional probability distribution for the i-th observation is

Assume that are i.i.d., , the joint probability distribution of conditional on the is

The likelihood function is the above joint probability distribution treated as a function of the unkown coefficients . The ML estimators are

The ML estimators are consistent and asymptotically normally distributed. They are also efficient and commonly used in practice.

#### 3.3 Comparison

Both the probit and logit are nonlinear “translators”. There is no real reason to prefer on over the other.

Traditionally we saw more of the logit, mainly because the logistic function leads to a more easily computed model. Nowadays, probit is easy to compute with standard packages and thus becomes more popular.

#### 3.4 Interpretation, Estimation, Inference

**Interpretation**

Clearly, coefficient estimates across the three models are not directly comparable. It’s the probability of being denied that is of interest. We can compare sign and significance (based on a standard z test) of coefficients.

In general we care about the effect of on , that is, we care about :

- For the linear case, this is easily computed as the coefficient of

- For the nonlinear probit and logit models,
- , where
- The adjustment factor, depends on

**Estimation and Inference**

For probit and logit models, the difficulty is that partial effects are not constant but depend on . Thus PEA and APE are introduced:

- PEA: Partial Effects at the Average.
- The partial effect of explanatory variable is considered for an “average” individual. This is problematic in the case of explanatory variables such as gender.
- For discrete explanatory variables, say, for a change in from to

- APE: Average Partial Effects
- The partial effect of explanatory variable is computed for each individual in the sample and then averaged across all sample members. This method makes more sense.
- For discrete explanatory variables, say, for a change in from to

#### 3.5 Goodness-of-fit measures

**Percent correctly predicted**

Individual ’s outcome is predicted as one if the probability for this event is larger than 0.5, then percentage of correctly predicted and is counted. There are thus four possible outcomes on each pair : . Then,

- percent correctly predicted for :

- percent correctly predicted for :

Percent correctly predicted is the weighted average of the above two. The weights are the fraction of zeros and ones in the sample.

**Pseudo R-squared**

Compare maximized log-likelihood of the model with that of a model that only contains a constant (and no explanatory variables)

- log-likelihoods are negative, so , and

- If no are significant, should be close to

- if ,

- cannot reach zero in a probit/logit model. Requires the estimated probabilities when all to be unity and the estimated probabilities when all to be zero.

**Correlation based measures**

Define , calculate . In any case, goodness-of-fit is usually less important than trying to obtain convincing estimates of the ceteris paribus effects of the explanatory variables.

#### 3.6 Hypothesis Test after MLE

- The usual z-tests and confidence intervals can be used

- Likelihood ratio test (restricted and unrestricted models needed)

where is the log-likelihood value for the unrestricted (restricted) model. Based on the same concept as the F test in a linear model . Basic idea: and under , is close to zero.

#### 3.7 Latent Variable Model

Probit and logit models can be derived from an underlying latent variable model.

is unobserved, or latent, variable which rarely has a well-defined unit of measurement. For example, might be the difference in utility levels from two different actions. has either the standard normal or logistic distribution, symmetrically distributed about zero, which means . is observable. here is the indicator function, which takes on the value one if the event in the bracket is true, and zero otherwise.

The response probability for :

Loading Comments...