Large-sample theory for linear regressions applied to time series data

Andrew Pua

April 2022

Issues with regressions involving time series data

Motivation: Moving beyond IID

Motivation: Is OLS working?

Simulated data from \(Y_t=u_t\), \(n=500\) observations

Simulated data from \(Y_t=0.5Y_{t-1}+u_t\), \(n=500\) observations

Simulated data from \(Y_t=0.95Y_{t-1}+u_t\), \(n=500\) observations

Simulated data from \(Y_t=Y_{t-1}+u_t\), \(n=500\) observations

Design of Monte Carlo

Monte Carlo simulation: code

set.seed(20220318)
require(dyn)
reps <- 10^3
mod <- 1
coefs <- matrix(NA, nrow=reps, ncol=4)
SEs <- matrix(NA, nrow=reps, ncol=4)
t.stat <- matrix(NA, nrow=reps, ncol=4)
for (i in 1:reps)
{
  y1 <- arima.sim(n = 40*mod, list(order=c(0,0,0)))
  y2 <- arima.sim(n = 40*mod, list(order=c(1,0,0), ar = 0.5), innov = y1)
  y3 <- arima.sim(n = 40*mod, list(order=c(1,0,0), ar = 0.95), innov = y1)
  y4 <- ts(cumsum(y1))
  model.y1 <- dyn$lm(y1~lag(y1,-1))
  model.y2 <- dyn$lm(y2~lag(y2,-1))
  model.y3 <- dyn$lm(y3~lag(y3,-1))
  model.y4 <- dyn$lm(y4~lag(y4,-1))
  temp.c <- c(coef(model.y1)[2],coef(model.y2)[2],coef(model.y3)[2],coef(model.y4)[2])
  temp.d <- sqrt(c(vcov(model.y1)[2,2],vcov(model.y2)[2,2],vcov(model.y3)[2,2],vcov(model.y4)[2,2]))
  coefs[i,] <- temp.c
  SEs[i,] <- temp.d 
  t.stat[i,] <- (temp.c-c(0,0.5,0.95,1))/temp.d
}

Monte Carlo simulation: results for \(n=40\)

mean.ols <- colMeans(coefs)
mean.reg.se <- colMeans(SEs)
sd.ols <- apply(coefs, 2, sd)
p.vals <- (2*pnorm(-abs(t.stat)))<0.05
p.vals <- apply(p.vals, 2, mean)
results <- rbind(mean.ols, mean.reg.se, sd.ols, p.vals)
colnames(results) <- c("beta1=0", "beta1=0.5", "beta1=0.95", "beta1=1")
results
##             beta1=0 beta1=0.5 beta1=0.95 beta1=1
## mean.ols    -0.0268     0.433     0.8314  0.8740
## mean.reg.se  0.1626     0.146     0.0868  0.0746
## sd.ols       0.1595     0.150     0.1115  0.0992
## p.vals       0.0650     0.070     0.1880  0.2920

Monte Carlo simulation: results for \(n=160, 640\)

##              beta1=0 beta1=0.5 beta1=0.95 beta1=1
## mean.ols    -0.00368    0.4882     0.9247  0.9677
## mean.reg.se  0.07957    0.0694     0.0297  0.0188
## sd.ols       0.07570    0.0663     0.0334  0.0274
## p.vals       0.04300    0.0350     0.0870  0.3010
##              beta1=0 beta1=0.5 beta1=0.95 beta1=1
## mean.ols    -0.00228    0.4964     0.9445 0.99160
## mean.reg.se  0.03959    0.0344     0.0129 0.00483
## sd.ols       0.03922    0.0342     0.0133 0.00679
## p.vals       0.04900    0.0460     0.0490 0.29300

A curiosity

Sampling distribution of \(t\)-statistic under the null of a unit root

Spurious regressions: setup

set.seed(20220317)
y1 <- 0.1 + arima.sim(n = 1000, list(order = c(0, 0, 0)))
x1 <- 0.1 + arima.sim(n = 1000, list(order = c(0, 0, 0)))
y2 <- ts(cumsum(y1))
x2 <- ts(cumsum(x1))

Spurious regressions: OLS results, first case

summary(dyn$lm(y1~x1))
## 
## Call:
## lm(formula = dyn(y1 ~ x1))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.588 -0.650  0.004  0.699  3.188 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   0.0958     0.0313    3.06   0.0023 **
## x1            0.0282     0.0320    0.88   0.3797   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.988 on 998 degrees of freedom
## Multiple R-squared:  0.000773,   Adjusted R-squared:  -0.000228 
## F-statistic: 0.772 on 1 and 998 DF,  p-value: 0.38

Spurious regressions: OLS results, second case

summary(dyn$lm(y2~x2))
## 
## Call:
## lm(formula = dyn(y2 ~ x2))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -33.04 -11.37   0.24  10.73  26.51 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  30.4113     0.6112    49.8   <2e-16 ***
## x2            0.8107     0.0147    55.1   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.1 on 998 degrees of freedom
## Multiple R-squared:  0.752,  Adjusted R-squared:  0.752 
## F-statistic: 3.03e+03 on 1 and 998 DF,  p-value: <2e-16

Spurious regressions: explanation

Spurious regressions: a solution

summary(dyn$lm(diff(y2)~diff(x2)))
## 
## Call:
## lm(formula = dyn(diff(y2) ~ diff(x2)))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.590 -0.650  0.002  0.698  3.187 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   0.0973     0.0313    3.11    0.002 **
## diff(x2)      0.0291     0.0320    0.91    0.364   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.987 on 997 degrees of freedom
## Multiple R-squared:  0.000826,   Adjusted R-squared:  -0.000176 
## F-statistic: 0.825 on 1 and 997 DF,  p-value: 0.364

A path beyond IID in the time series case

The simplest ergodic theorem

The simplest ergodic theorem, continued

What are the options to control these autocovariances?

Independence and predictability

The structure of a consistency proof

One path to move beyond IID in the time series case: asymptotic normality

Path to a CLT and the different forms of \(V\)

The structure of an asymptotic normality proof

How do you consistently estimate the asymptotic covariance matrix?

How do you consistently estimate the asymptotic covariance matrix, continued

Consistently estimating the “meat” under different assumptions

Understanding the non-MDS case

Understanding the non-MDS case: alternative options

Understanding the non-MDS case: new research

Understanding the non-MDS case: new research

Stochastic processes

Examples of SPs

Let \(t\in \mathbb{Z}\).

Examples of SPs, continued

Stationary SPs

The need for ergodicity

The need for ergodicity, continued

Martingales

Martingale differences

IID vs MDS vs WN

Why is talking about MDS important?

Why is talking about MDS important, continued

Why is talking about MDS important, continued

Special case: AR(1) model

Exercises

Suppose you have \(Y_t=X_t^\prime \beta^o+\varepsilon_t\). Assume that \(\{\left(Y_t,X_t^\prime\right)\}\) are realizations from an ergodic stationary process. Assume that the relevant moments exist.

Exercises, continued

CLT for zero mean ergodic stationary SPs: setup

CLT for zero mean ergodic stationary SPs: first idea

CLT for zero mean ergodic stationary SPs: second idea

CLT for zero mean ergodic stationary SPs: last idea

CLT for zero mean ergodic stationary SPs: the statement

Exercises in the textbook

My suggestion is to focus on the following exercises: