## TOC

### 1. **Definition & Influence**

#### 1.1 Endogeneity

An endogenous variable is a variable that is correlated with , that is

An exogenous variable is a variable that is uncorrelated with , that is

*: The correlation between and implies that the Ceteris Paribus assumption does not hold , where Ceteris Paribus is a Latin phrase meaning โall other things being equalโ.*

**Endogeneity**#### 1.2 Influence of endogeneity

When , the consequence is that the OLS estimator is

**inconsistent**and**biased**.For simple linear regression

then

If , the effect is shown as below figure

### 2. Sources of Endogeneity

Main sources of endogeneity include Omitted variable bias (OVB), Wrong functional form, Measurement error, Simultaneous causality, Sample selection, etc.

#### 2.1 Omitted variable bias (OVB)

**2.1.1 Definition**

when is omitted, we have

Now

if and .

The intuitive reason is that, in addition to its direct effect , has an apparent indirect effect as a consequence of acting as a proxy for the missing . The strength of the proxy effect depends on two factors: the strength of the effect of on , which is given by , and the ability of to mimic , i.e. .

For example:

- when has a positive bias;

- when has a negative bias.

ย

**2.1.2 Solutions to OVB**

- If the variable can be measured, include it as an additional regressor in multiple regression

- Possibly, use panel data in which each entity (individual) is observed more than once

- If the variable cannot be measured, use instrumental variable (IV) regression

- If the variable cannot be measured, use proxy variable (another variable which is correlated with the omitted variable but can be measured and easily accessed)
- Good proxy variables should satisfy

then

#### 2.2 **Wrong Functional Form**

**2.2.1 Definition**

Wrong functional form arises if the functional form used in the regression is incorrect. For example, the true relationship between and is

If we run a regression

Then

and

**2.2.2 Testing**

To test whether there are omitted nonliner terms, we can follow below steps:

- Regress
- test whether . If so, there are no omitted nonlinear terms. Otherwise, there is.

**2.2.3 Solutions to functional form misspecification**

- For continuous dependent variable: use โappropriateโ nonlinear specifications in (logarithms, interactions, etc.)

- For discrete (e.g. binary) dependent variable: need an extension of multiple regression methods (โprobitโ or โlogitโ analysis for binary dependent variables)

- Some other
**Nonparametric Econometrics**methods

#### 2.3 Measurement Error

**2.3.1 Definition**

In reality, economic data often have measurement error for some reasons:

- Data entry errors in administrative data

- Recollection errors in surveys (e.g. when did you start your current job?)

- Ambiguous questions (e.g. what was your income last year?)

- Intentionally false response problems with surveys (e.g. What is the current value of your financial assets?)

Assume the model we want to estimate is

but we can only access measurement , which differs from the true value of by an error , i.e. . Itโs intuitive to assume:

Then

the estimation of is

The bias is called

**Attenuation bias**, the**bias towards zero (estimated coefficientsโ abstract values are always smaller):**- When , the OLS estimator is biased upward (positive bias, estimated beta tends to be larger)

- When , the OLS estimator is biased downward (negative bias, estimated beta tends to be smaller)

Explanation about the bias towards zero is that we are tring to use the association between and to capture the strength of causal link between and . However, due to the presence of the noise , the association is a dempened measure (having smaller abstract value) of the causal link.

**2.3.2 Solutions**

- Obtain better data

- Develop a specific model of the measurement error process
- This is only possible if a lot is known about the nature of the measurement error

- Instrument variable (IV) regression

*Supplement:*

when there is noise in , that is we can only access measurement , where is random error. Then

is still โs consistent and unbiased estimation but has larger variance (recall that )

#### 2.4 **Simultaneity**

**Definition**

In structural models, for example, supply and demand model, there may exist endogeneity as well.

There are two variables: quantity and price.

- (D): ,

- (S): ,

In market equilibrium, . Besides, we assume that prices and quantites are endogenous (by assumptions of ) and they are determined simultanously.

From the market equilibrium condition, we have

thus

and thus

We can explain the endogeneity from another perspective

can be regarded as inputs in a (market) system. can be regarded as output of a (market) system. In general, will be correlated with both and .

Furthermore, if we run the regression

using market data, then we get something that is a mix of supply and demand curves. tends to be between and .

has two effects on :

- For producers, larger causes to increase

- For consumers, larger causes to decrease

*Example*Assume

Regress , prove that

According to market equilibrium, , that is . Therefore, .

since

therefore

Loading Comments...