Least squares algebra and finite-sample theory

Andrew Pua

March 2022

Motivation: What do you gain from correct specification?

Illustration: Theory behind Mankiw, Romer, and Weil (1992)

Illustration: Econometrics behind Mankiw, Romer, and Weil (1992)

Illustration: Data used by Mankiw, Romer, and Weil (1992)

Replicating Table I of MRW (1992)

options(digits=3) # Learn to present to the appropriate precision
require(haven) # Need this package to load Stata datasets
## Loading required package: haven
MRW <- read_dta("./MRW.dta") # Load Stata dataset
# Generate new variables
MRW$ly85 <- log(MRW$y85)
MRW$linv <- log(MRW$inv/100)
MRW$lpop <- log(MRW$pop/100 + 0.05)
MRW.TableI.nonoil <- lm(ly85 ~ linv + lpop, data = subset(MRW, MRW$n==1)) # Apply OLS
coef.TableI.nonoil <- coefficients(MRW.TableI.nonoil) # Extract coefficients
coef.TableI.nonoil 
## (Intercept)        linv        lpop 
##        5.43        1.42       -1.99

Interpreting coefficients: Dangers

set.seed(20220312)
n <- 1000
true_ability <- rnorm(n, 50, 10)
noise_1 <- rnorm(n, 0, 10)
noise_2 <- rnorm(n, 0, 10)
midterm <- true_ability + noise_1
final <- true_ability + noise_2
lm(final ~ midterm)
## 
## Call:
## lm(formula = final ~ midterm)
## 
## Coefficients:
## (Intercept)      midterm  
##      22.666        0.541

Interpreting coefficients: Location-scale transformations

Interpreting coefficients: Interaction terms

Interpreting coefficients: Powers

Interpreting coefficients: Centering as a special case of location-scale transformations

Interpreting coefficients: logarithms

Interpreting coefficients: logarithms, continued

Setting everything up as matrices

Solving the LS problem

LS minimizer

Algebraic properties of LS minimizer

Exercises

Measures of fit

Measures of fit, continued

Exercises

(Stachurski) Here is a series of properties of the R-squared.

Frisch-Waugh-Lovell (FWL) Theorem

The multicollinearity problem

Least squares with linear equality constraints or restrictions

Exercises

Exercises, continued

Exercises, continued

What was the point of the previous exercises?

Detour: Expected values applied to matrices

IID conditions versus conditioning on \(\boldsymbol{\mathrm{X}}\)

IID conditions versus taking \(\boldsymbol{\mathrm{X}}\) as fixed in repeated samples

Statistical properties of the OLS estimator

Assumptions to obtain unbiasedness

Summary output from lm() for MRW(1992)

summary(MRW.TableI.nonoil)
## 
## Call:
## lm(formula = ly85 ~ linv + lpop, data = subset(MRW, MRW$n == 
##     1))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7914 -0.3937  0.0412  0.4337  1.5805 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    5.430      1.584    3.43  0.00090 ***
## linv           1.424      0.143    9.95  < 2e-16 ***
## lpop          -1.990      0.563   -3.53  0.00064 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.689 on 95 degrees of freedom
## Multiple R-squared:  0.601,  Adjusted R-squared:  0.592 
## F-statistic: 71.5 on 2 and 95 DF,  p-value: <2e-16

Statistical properties of the OLS estimator, continued

Simplifying the form of the covariance matrix

\[\mathsf{Var}\left(\varepsilon|\mathbf{X}\right)=\begin{pmatrix}\mathsf{Var}\left(\varepsilon_{1}|\mathbf{X}\right) & \mathsf{Cov}\left(\varepsilon_{1},\varepsilon_{2}|\mathbf{X}\right) & \ldots & \mathsf{Cov}\left(\varepsilon_{1},\varepsilon_{n}|\mathbf{X}\right)\\ \mathsf{Cov}\left(\varepsilon_{2},\varepsilon_{1}|\mathbf{X}\right) & \mathsf{Var}\left(\varepsilon_{2}|\mathbf{X}\right) & \ldots & \mathsf{Cov}\left(\varepsilon_{2},\varepsilon_{n}|\mathbf{X}\right)\\ \vdots & \vdots & \ddots & \vdots\\ \mathsf{Cov}\left(\varepsilon_{n},\varepsilon_{1}|\mathbf{X}\right) & \mathsf{Cov}\left(\varepsilon_{n},\varepsilon_{2}|\mathbf{X}\right) & \ldots & \mathsf{Var}\left(\varepsilon_{n}|\mathbf{X}\right) \end{pmatrix}.\]

What happens if Assumption 3.4 is not satisfied and 3.1-3.3 are still satisfied?

Exercises

What happens if Assumption 3.4 is not satisfied and 3.1-3.3 are still satisfied?

Proof of Gauss-Markov property

Why do most misunderstand the consequences of multicollinearity?

Exercises

Understanding how to estimate \(\sigma^2\)

Orthogonal transformations

Impact of orthogonal transformation

Impact of orthogonal transformation, continued

Impact of orthogonal transformation, continued

Constructing an estimator for \(\sigma^2\)

Replicating Table I of MRW (1992), continued

cov.TableI.nonoil <- vcov(MRW.TableI.nonoil) # Estimated covariance matrix
cov.TableI.nonoil 
##             (Intercept)   linv   lpop
## (Intercept)      2.5087 0.0983 0.8799
## linv             0.0983 0.0205 0.0229
## lpop             0.8799 0.0229 0.3174
sqrt(diag(cov.TableI.nonoil)) # Estimated standard errors 
## (Intercept)        linv        lpop 
##       1.584       0.143       0.563

The normal linear regression model

The normal linear regression model, continued

Inference under the normal linear regression model

Constructing confidence sets

Constructing hypothesis tests

Replicating Table I of MRW (1992), continued

R.mat <- c(0, 1, 1) # R matrix for testing Solow hypothesis
# Calculate F statistic for testing Solow hypothesis
test.stat <- t(R.mat %*% coef.TableI.nonoil) %*% solve(R.mat %*% cov.TableI.nonoil %*% R.mat) %*%
  R.mat %*% coef.TableI.nonoil
# Test statistic, p-value, critical value
c(test.stat, 1-pf(test.stat, 1, 95), qf(0.95, 1, 95))
## [1] 0.834 0.363 3.941

Replicating Table I of MRW (1992), continued

# Generate new variable for restricted regression
MRW$ldiff <- MRW$linv - MRW$lpop
# Apply OLS to restricted regression
MRW.TableI.restricted.nonoil <- lm(ly85 ~ ldiff, data = subset(MRW, MRW$n==1))
# Compute test using SSR comparisons
anova(MRW.TableI.nonoil, MRW.TableI.restricted.nonoil)
## Analysis of Variance Table
## 
## Model 1: ly85 ~ linv + lpop
## Model 2: ly85 ~ ldiff
##   Res.Df  RSS Df Sum of Sq    F Pr(>F)
## 1     95 45.1                         
## 2     96 45.5 -1    -0.396 0.83   0.36

Replicating Table I of MRW (1992), continued

# Apply OLS to reparameterized model
MRW.TableI.repar.nonoil <- lm(ly85 ~ ldiff + lpop, data = subset(MRW, MRW$n==1))
summary(MRW.TableI.repar.nonoil)[[4]] # Focusing on the summary of the coefficients
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)    5.430      1.584   3.428 9.00e-04
## ldiff          1.424      0.143   9.951 2.10e-16
## lpop          -0.566      0.619  -0.913 3.63e-01

Replicating Table I of MRW (1992), continued

est.beta1 <- coef(MRW.TableI.restricted.nonoil)[[2]]
implied.alpha <- est.beta1/(1+est.beta1)
implied.alpha
## [1] 0.598
est.var <- c(0, 1/(1+est.beta1)^2) %*% vcov(MRW.TableI.restricted.nonoil) %*% c(0, 1/(1+est.beta1)^2)
delta.method.se <- sqrt(est.var)
delta.method.se
##        [,1]
## [1,] 0.0201

Exercises

Monte Carlo simulation: setup

Monte Carlo simulation: code

set.seed(20220312)
n <- 50
# "True" beta values
beta0.o <- -1
beta1.o <- 2
reps <- 10^4
# Storage for OLS estimates (2 entries per replication)
beta.store <- matrix(NA, nrow=reps, ncol=2)
# Storage for robust covariance matrix (4 entries per replication, 2x2 matrix)
rob.store <- matrix(NA, nrow=reps, ncol=4)
# Storage for non-robust covariance matrix (4 entries per replication, 2x2 matrix)
nonrob.store <- matrix(NA, nrow=reps, ncol=4)
# Monte Carlo loop
for (i in 1:reps)
{
  X.t <- rbinom(n, 1, 0.3)  # Generate X
  eps.t <- (rnorm(n, 0, 1))*(X.t == 1)+(rnorm(n, 0, 1))*(X.t == 0)   # Generate epsilon
  Y.t <- beta0.o + beta1.o*X.t + eps.t   # Generate Y
  matXX <- t(cbind(1, X.t)) %*% cbind(1, X.t)  # X'X matrix
  beta.hat <- solve(matXX) %*% (t(cbind(1, X.t)) %*% Y.t)   # OLS
  # robust cov matrix
  bread <- matXX
  resid <- Y.t - cbind(1, X.t) %*% beta.hat
  meat <- (t(cbind(1, X.t)) %*% diag(c(resid^2))) %*% cbind(1, X.t)
  est.rob <- (solve(bread) %*% meat) %*% solve(bread)
  s.sq <- 1/(n-2) * sum(resid^2)   # estimator for sigma2 under cond. homoscedasticity
  est.nonrob <- s.sq * solve(matXX)  # nonrobust cov matrix
  beta.store[i,] <- c(beta.hat)
  rob.store[i,] <- c(est.rob)
  nonrob.store[i,] <- c(est.nonrob)
}

Monte Carlo simulation: center and spread

apply(beta.store, 2, mean) # Calculate the mean of the 10^4 beta's
## [1] -1  2
apply(beta.store, 2, sd) # Calculate the SD of the 10^4 beta's
## [1] 0.172 0.313
apply(sqrt(rob.store[,c(1,4)]), 2, mean) # Average SE robust
## [1] 0.166 0.301
apply(sqrt(nonrob.store[,c(1,4)]), 2, mean) # Average SE nonrobust
## [1] 0.169 0.313

Monte Carlo simulation: Behavior of \(t\)-statistics

# Test statistic for null beta0 = -1 using nonrobust cov matrix
t.ratios.beta0 <- (beta.store[,1]-beta0.o)/sqrt(nonrob.store[,1])
# Test statistic for null beta0 = -1 using robust cov matrix
t.ratios.beta0.rob <- (beta.store[,1]-beta0.o)/sqrt(rob.store[,1])
# Test statistic for null beta1 = 2 using nonrobust cov matrix
t.ratios.beta1 <- (beta.store[,2]-beta1.o)/sqrt(nonrob.store[,4])
# Test statistic for null beta1 = 2 using robust cov matrix
t.ratios.beta1.rob <- (beta.store[,2]-beta1.o)/sqrt(rob.store[,4])
# Empirical rejection rate alpha = 0.05
mean(abs(t.ratios.beta0)>qnorm(0.975))
## [1] 0.0592
mean(abs(t.ratios.beta1)>qnorm(0.975))
## [1] 0.0542
mean(abs(t.ratios.beta0.rob)>qnorm(0.975))
## [1] 0.0653
mean(abs(t.ratios.beta1.rob)>qnorm(0.975))
## [1] 0.0677

Monte Carlo simulation: Distribution of \(t\)-statistics

Exercises in the main textbook