## TOC

1. Instrumental Variables1.1 Definitions 1.2 Examples2. Deriving IV estimator2.1 MOM Regression2.2 2SLS and Multiple Regression3. Properties of IV estimator3.1 Biased & Consistent3.2 Asymptotic Properties4. Instrument Relevance4.1 Weak Instruments4.2 Detecting Weak Instruments5. Instrument Exogeneity6. OLS or IV7. IV/2SLS Matrix Form7.1 IV Matrix Form7.2 2SLS Matrix Form

### 1. Instrumental Variables

#### 1.1 Definitions

**Exogenous variable**is a variable that is uncorrelated with ,

**Endogenous variable**is a variable that is correlated with ,

An

**Instrumental Variable (IV)**is a variable that is correlated with but uncorrelated with sWe can use IV to estimate the effect on of only that part of that is correlated with . Because is uncorrelated with , the part of that is correlated with must also be uncorrelated with .

#### 1.2 Examples

**Survey on twins: measurement error**

When economist is worried about measurement error, a good choice of instrument is simply a different measure of the same variable. The new measure may have its own errors, but these errors are unlikely to be correlated with the mistakes in the first measure, or with any other component of . For example, Ashenfelter and Rouse were studying the effect of education on earnings. Their data came from a survey of twins. They were concerned that individuals might mis-report their own years of schooling, leading to measurement error biases. However, Ashenfelter and Rouse had two separate measures for each individualโs years of schooling: the survy asked each individual to list both his/her own years of schooling, and also the years of schooling for his/her twin. The twinโs report of an individualโs schooling served as an instrumental variable for the individualโs self-report.

Itโs a good instrument because:

**Cigarettes Sold: Simutaneous error**

Suppose we are studying the effect of price on the demand for cigarettes, using a cross-section of different statesโ cigarette consumption and average price

where indexes each state.

- Each stateโs cigarette excise tax is not a good IV:
- taxes reflect the level of anti-smoking sentiment in the state, thus

- A measure of state anti-smoking laws:
- A proxy of anti-smoking sentiment
- highly correlated with

- Each stateโs sales tax
- State sales taxes are correlated with cigarette prices
- a relatively good IV

### 2. Deriving IV estimator

#### 2.1 MOM Regression

Consider the single regressor and single IV case. Since there is only one IV, we can derive the IV estimator as following (kind of intuitively)

where is the โcleanโ part and is the โdirtyโ part. and .

- regress on :

- regress directly on :

thus

Another way to deriving IV estimators (Method of Moments, MOM)

**Supplement: MOM estimator is equivalent to OLS estimator**

law of large number

For regression model

thus

That is, OLS estimator is an MOM estimator.

#### 2.2 2SLS and Multiple Regression

**2.2.1 Multiple regression model**

where are endogenous regressors and are included exogenous regressors.

Let be instruments, so . The model is

- exactly (just) identified when ;

- over-identified when

- under-identified when

Note that we cannot use as an instrument for , otherwise, there will be multi-colinearity problem. Therefore, we require that

**2.2.2 2SLS Procedure**

1st Stage Regression:

- Regress each of the on ALL exogenous variables (including all and ) to get predicted

- We should include in the first stage because and may be correlated. Otherwise, may be correlated with and then in stage 2, residual may be correlated with , introducing new endogeneity.

2nd Stage Regression:

- Regress on predicted and

### 3. Properties of IV estimator

#### 3.1 Biased & Consistent

**3.1.1 IV estimator is biased (single IV)**

if , using law of expectation iteration, thus,

since X is endogenous, cannot be zero. Therefore, IV estimator is biased.

**3.1.2 IV estimator is consistent (single IV)**

#### 3.2 Asymptotic Properties

**3.2.1 Asymptotic Normality of IV estimator (single IV)**

Estimation:

where

**3.2.2 Asymptotic Variance of IV estimator under conditional homoskedasticity**

Under conditional homo: , plus , we have

Thus

- Standard error in the IV case differs from OLS only in the from regressing on ()

- Since , the IV standard errors are larger than the OLS standard errors

- The stronger the correlation between and , the smaller the IV standard errors

**3.2.3 Asymptotic Variance of IV/2SLS estimator**

Consider a model having a single endogenous explanatory variable

Assume is endogenous and is exogenous. Following 2SLS procedure, after regress on its

**IVs**and , we obtain . Then regress on . Thus the (asymptotic) variance of iswhere , is the total variation in , and is the R-squared from a regression of on .

Compare with the OLS estimator which directly regress on and

We can yield the same conclusion as above MOM that variance of IV (here ) estimator is larger than variance of OLS estimator:

- No difference in

- : because total sum square = explained sum of squares + residual sum of squares. is the total sum square while is the explained sum of squares according to first stage regression.

- : the correlation between and is larger than the correlation between and because of the first stage regression.

Because , when there is a multicollinearity issue (variance or SE of estimator gets large), 2SLS estimator will suffer even more.

### 4. Instrument Relevance

#### 4.1 Weak Instruments

Focus on a single included endogenous regressor:

First stage regression is

- The instruments are relevant if at least one of are nonzero

- The instruments are said to be weak if all the are either zero or nearly zero

Weak instruments explain very little of the variation in , beyond that explained by the .

If the instruments are weak, the sampling distribution of the 2SLS estimators and t-statistics are not normal even in large samples.

The existence of instruments make IV estimator less desirable, take the one IV example:

It shows that, even if is small, the inconsistency in the IV estimator can be very large if is also small. Thus, even if we only focus on consistency, it is not necessarily better to use IV than OLS even if the correlation between and is smaller than that between and because

#### 4.2 Detecting Weak Instruments

We can use F-test in the first stage regression to detect weak instruments. For regression,

we wish to test the hypotheses

Note that the test is only on the coefficients of the Zโs not the Wโs.

**Rule of thumb:**means that instruments are not weak.

The intuition of comparing with 10 is to test whether the bias of 2SLS, relative to OLS, is less than 10%. If is smaller than 10, the relative bias exceeds 10%, that is, 2SLS can have substantial bias.

For the general case where there are multiple , rank condition and matrix algebra are needed.

### 5. Instrument Exogeneity

For the simplest model: ,

if , then IV estimator is inconsistent.

**Order condition**: We cannot test exogeneity when we have exact identification (i.e. the number of instruments equals the number of endogenous regressors).

Suppose we have two instruments: for model . Then we have two possible first-stage regression:

These two first stage regression will lead to two 2SLS estimates. If both instruments are exogenous, then these two estimates are expected to be close to each other as both are consistent. If these two estimates are far apart from each other, then it would be reasonable to believe one or both IVs are not exogenous.

If we have multiple instruments, it is possible to test for the exogeneity. The exogeneity of instruments means that they are uncorrelated with . This suggests that 2SLS residual should be approximately uncorrelated with the instruments. Test Procedure:

(1) Run the 2SLS by using all potential IVs and obtain the 2SLS residuals

(2) Run the regression

F-statistic for

, where is the number of excluded IVs () and is the number of endogenous regressors ().

We reject the null for large values of the J-stats. Note that we require otherwise always.

### 6. **OLS or IV**

**Considerations**

If explanatory variable is exogenous:

- IV estimator and OLS estimator are both consistent

- Therefore, use OLS estimator

If explanatory variable if endogenous:

- IV estimator is consistent

- OLS estimator is not consistent

- Therefore, use IV estimator

**Test for Endogeneity of a Single Explanatory Variable**

Suppose is endogenous, and IV is .

In the 1st stage of 2SLS, we know that if is correlated with , it must be is correlated with . So regress on (may have multiple ) and to get and regress

Test (which means is exogenous)

STATA command:

`estat endogenous`

if p < significance level, rejecet and is endogenous.

### 7. IV/2SLS Matrix Form

#### 7.1 IV Matrix Form

Let the equation of interest be

where is a vector. Assume that so there is endogeneity. We call this equation the structural equation. In matrix notation, this can be written as

Definition of IV: The random vector is an instrumental variable for above structural model if:

- (instrument exogeneity)

- (instrument relevance)

In a typical set-up, some regressors in (at least the intercept) will be uncorrelated with . Thus we make the partition

where and . We call exogenous and endogenous. should be included in . So we have the partition

where contains the included exogenous variables and contains the excluded exogenous variables.

The mode is

**just-identified**if (i.e., ) and is**over-identified**if (i.e., )The reduced form relationship between and the instrument is found by linear projection:

where is an matrix of coefficients, and is the projection error such that . In matrix notation, we can write as

where is an matrix.

By construction, a linear projection can be estimated by OLS

where

The reduced form for is

where and . Its element . Observe that

The above equation can be estimated by OLS

where . The reduced form equation for the system is

- If , then

- If , then for any p.d. matrix ,

#### 7.2 2SLS Matrix Form

The two-stage least squares (2SLS) estimation

**Stage 1**: Regress on to obtain , and save the predicted value .

**Stage 2:**Regress on to obtain the 2SLS estimator

If the model is just-identified, so that , then the formula for can be simplified to:

which is also called the instrumental variable (IV) estimator and written as in the literature.

In the just-identified case, can also have some other interpretation. Since , we can construct the indirect least squares (ILS) estimator:

Loading Comments...