Issues with regressions involving time series data

Losing IID conditions: We need to modify our current set of asymptotic tools.
Spurious regressions: Do regressions help in making conditional predictions? Do regressions uncover meaningful relationships?
Unit roots: What happens to regressions when variables are trending?
And many more we do not cover: Take a time series course at some point.

Motivation: Moving beyond IID

We studied the asymptotic theory in the simple case of one regressor. The theory, with suitable modifications in notation, directly extends to the more general case of having more regressors.
But IID is restrictive for economic data: we have time series, spatial, and panel data.
The main textbook focuses squarely on the time series case.
- But, this course is not a pure time series course.
- So we focus on asking what types of time series data we can analyze using linear regression.
- We also ask what types of time series data are covered by changes in the asymptotic theory.

Motivation: Is OLS working?

Suppose you are interested in estimating the parameters of a first-order autoregression or AR(1) process \(Y_{t}=\beta_0^*+\beta_1^* Y_{t-1}+u_{t}\), where \(u_t\) is error from best linear prediction.
To give you a sense of what the data on \(\left\{ Y_{t}\right\}_{t=1}^{n}\) would look like, here are some pictures where \(\beta_0^*=0\) and \(\beta_1^*\) can be 0, 0.5, 0.95, and 1. I assume that \(u_{t}\sim N\left(0,1\right)\) and \(Y_{0}\sim N\left(0,1\right)\).
You will see two plots side-by-side. One is a time-series plot where \(Y_{t}\) is plotted against \(t\) and the other is a scatterplot where \(Y_{t}\) is plotted against \(Y_{t-1}\).
To enhance comparability, I fix the use the set of randomly drawn \(u_{t}\)’s and \(Y_{0}\)’s.

Simulated data from \(Y_t=u_t\), \(n=500\) observations

Simulated data from \(Y_t=0.5Y_{t-1}+u_t\), \(n=500\) observations

Simulated data from \(Y_t=0.95Y_{t-1}+u_t\), \(n=500\) observations

Simulated data from \(Y_t=Y_{t-1}+u_t\), \(n=500\) observations

Design of Monte Carlo

Now, let us evaluate the performance of OLS when we generate multiple “instances” of the first-order autoregression given earlier.
So the Monte Carlo design here is as follows:
- \(\beta_0^*=0\), \(\beta_1^*\in\left\{ 0,0.5,0.95,1\right\}\)
- \(Y_{t}=\beta_0^*+\beta_1^* Y_{t-1}+u_{t}\), where \(u_{t}\sim N\left(0,1\right)\) and \(Y_{0}\sim N\left(0,1\right)\)
A 5% significance level was used for testing the null that \(\beta_1^*\) is equal to the value in the indicated column.

Monte Carlo simulation: code

set.seed(20220318)
require(dyn)
reps <- 10^3
mod <- 1
coefs <- matrix(NA, nrow=reps, ncol=4)
SEs <- matrix(NA, nrow=reps, ncol=4)
t.stat <- matrix(NA, nrow=reps, ncol=4)
for (i in 1:reps)
{
  y1 <- arima.sim(n = 40*mod, list(order=c(0,0,0)))
  y2 <- arima.sim(n = 40*mod, list(order=c(1,0,0), ar = 0.5), innov = y1)
  y3 <- arima.sim(n = 40*mod, list(order=c(1,0,0), ar = 0.95), innov = y1)
  y4 <- ts(cumsum(y1))
  model.y1 <- dyn$lm(y1~lag(y1,-1))
  model.y2 <- dyn$lm(y2~lag(y2,-1))
  model.y3 <- dyn$lm(y3~lag(y3,-1))
  model.y4 <- dyn$lm(y4~lag(y4,-1))
  temp.c <- c(coef(model.y1)[2],coef(model.y2)[2],coef(model.y3)[2],coef(model.y4)[2])
  temp.d <- sqrt(c(vcov(model.y1)[2,2],vcov(model.y2)[2,2],vcov(model.y3)[2,2],vcov(model.y4)[2,2]))
  coefs[i,] <- temp.c
  SEs[i,] <- temp.d 
  t.stat[i,] <- (temp.c-c(0,0.5,0.95,1))/temp.d
}

Monte Carlo simulation: results for \(n=40\)

mean.ols <- colMeans(coefs)
mean.reg.se <- colMeans(SEs)
sd.ols <- apply(coefs, 2, sd)
p.vals <- (2*pnorm(-abs(t.stat)))<0.05
p.vals <- apply(p.vals, 2, mean)
results <- rbind(mean.ols, mean.reg.se, sd.ols, p.vals)
colnames(results) <- c("beta1=0", "beta1=0.5", "beta1=0.95", "beta1=1")
results

##             beta1=0 beta1=0.5 beta1=0.95 beta1=1
## mean.ols    -0.0268     0.433     0.8314  0.8740
## mean.reg.se  0.1626     0.146     0.0868  0.0746
## sd.ols       0.1595     0.150     0.1115  0.0992
## p.vals       0.0650     0.070     0.1880  0.2920

Monte Carlo simulation: results for \(n=160, 640\)

Change the number assigned to mod to 4 and 16 and rerun the code to produce results for \(n=160\) and \(n=640\), respectively.

##              beta1=0 beta1=0.5 beta1=0.95 beta1=1
## mean.ols    -0.00368    0.4882     0.9247  0.9677
## mean.reg.se  0.07957    0.0694     0.0297  0.0188
## sd.ols       0.07570    0.0663     0.0334  0.0274
## p.vals       0.04300    0.0350     0.0870  0.3010

##              beta1=0 beta1=0.5 beta1=0.95 beta1=1
## mean.ols    -0.00228    0.4964     0.9445 0.99160
## mean.reg.se  0.03959    0.0344     0.0129 0.00483
## sd.ols       0.03922    0.0342     0.0133 0.00679
## p.vals       0.04900    0.0460     0.0490 0.29300

What do you notice about the results?

A curiosity

The plot you just saw is for the case where \(n=640\) and \(\beta_1^*=1\). The blue curve is the standard normal. In this case, you reject the null \(H_{0}:\;\beta_1^*=1\) more often than you should. Therefore you need new critical values to decide whether or not there is evidence in support or against the null.

Sampling distribution of \(t\)-statistic under the null of a unit root

Dickey and Fuller (1979) have shown that when testing the null of a unit root, the asymptotic distribution of the test statistic under the null is nonstandard.
But their research further indicates that the asymptotic distribution of the test statistic under the null changes depending on the presence or absence of deterministic variables in the autoregression (e.g. time trends, intercepts), and the nature of the null being tested.
For more on the nonstandard behavior in the unit root case, see Chang and Park (2002).
We will rule out this unit root case in our discussions, but we point one more issue related to the unit root case.

Spurious regressions: setup

Let us talk about another issue with running regressions when the variables are trending. Consider the following two situations:
- We have \(X_{t}\sim N\left(0.1,1\right)\), \(Y_{t}\sim N\left(0.1,1\right)\), with \(X_{t}\) and \(Y_{t}\) are independent, for all \(t=1,\ldots,n\). We run least squares regression of \(Y_{t}\) on \(X_{t}\).
- We have \(Y_{t}=0.1+Y_{t-1}+\epsilon_{t}\) and \(X_{t}=0.1+X_{t-1}+\eta_{t}\), where \(\epsilon_{t}\sim N\left(0,1\right)\), \(Y_{0}\sim N\left(0,1\right)\), \(\eta_{t}\sim N\left(0,1\right)\), \(X_{0}\sim N\left(0,1\right)\), and \(\epsilon_{t}\), \(\eta_{t}\), \(Y_{0}\), \(X_{0}\) are mutually independent. We run least squares regression of \(Y_{t}\) on \(X_{t}\).

set.seed(20220317)
y1 <- 0.1 + arima.sim(n = 1000, list(order = c(0, 0, 0)))
x1 <- 0.1 + arima.sim(n = 1000, list(order = c(0, 0, 0)))
y2 <- ts(cumsum(y1))
x2 <- ts(cumsum(x1))

Spurious regressions: OLS results, first case

summary(dyn$lm(y1~x1))

## 
## Call:
## lm(formula = dyn(y1 ~ x1))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.588 -0.650  0.004  0.699  3.188 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   0.0958     0.0313    3.06   0.0023 **
## x1            0.0282     0.0320    0.88   0.3797   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.988 on 998 degrees of freedom
## Multiple R-squared:  0.000773,   Adjusted R-squared:  -0.000228 
## F-statistic: 0.772 on 1 and 998 DF,  p-value: 0.38

Spurious regressions: OLS results, second case

summary(dyn$lm(y2~x2))

## 
## Call:
## lm(formula = dyn(y2 ~ x2))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -33.04 -11.37   0.24  10.73  26.51 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  30.4113     0.6112    49.8   <2e-16 ***
## x2            0.8107     0.0147    55.1   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.1 on 998 degrees of freedom
## Multiple R-squared:  0.752,  Adjusted R-squared:  0.752 
## F-statistic: 3.03e+03 on 1 and 998 DF,  p-value: <2e-16

Spurious regressions: explanation

What you have observed in the second case is a phenomenon called spurious regression or “nonsense regression”. A version of this phenomenon was noted by Yule (1926) but pointed out more recently by Granger and Newbold (1974).
- Granger and Newbold (1974) also show that measures of fit from spurious regressions will typically indicate very good fit even if the two variables are truly unrelated. This is yet another instance where standard measures of fit like the R-squared have to be interpreted with caution.
- Nonsense regressions can also happen in the context of IID data. Try simulating a case where there are many unrelated \(X\)’s included relative to sample size.
There are two broad ways of solving the spurious regression problem:
- Run a regression in first differences.
- Explore a cointegration analysis of the two series (“equilibrium relationship” exists between the two series).
- The first option is the most appropriate course of action given the simulation setting we have and the result is shown below.

Spurious regressions: a solution

summary(dyn$lm(diff(y2)~diff(x2)))

## 
## Call:
## lm(formula = dyn(diff(y2) ~ diff(x2)))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.590 -0.650  0.002  0.698  3.187 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   0.0973     0.0313    3.11    0.002 **
## diff(x2)      0.0291     0.0320    0.91    0.364   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.987 on 997 degrees of freedom
## Multiple R-squared:  0.000826,   Adjusted R-squared:  -0.000176 
## F-statistic: 0.825 on 1 and 997 DF,  p-value: 0.364

A path beyond IID in the time series case

Replace identical distribution with a time series concept called stationarity.
- In the IID case, the distribution of every random variable is the same.
- Because we want to cover cases where there is dependence, we need something that can have dependence and at the same time ensures that distributions stay the same somehow.
- In the IID case, we can essentially “shuffle” the random variables.
To have some dependence, we have to prevent “shuffling”. So, we look at “blocks” of random variables. These “blocks” need to have the same joint distribution over time. This is the key intuition for stationarity.
Next we have to find a way to avoid imposing independence. Some insight can be obtained from the proof that the sample mean is consistent for the population mean.

The simplest ergodic theorem

Replace independence with a time series concept called ergodicity.
To get a sense of this concept, let us revisit a proof of the law of large numbers without using Chebyshev’s inequality. Let us look at the proof where we have convergence in mean squares or convergence in quadratic mean (Definition 4.1 of textbook), i.e., \(\overline{Z} \overset{qm}{\to}\mathbb{E}\left(Z_{t}\right)\).
- Assume that there is stationarity, along with finite moments. As a result, \(\mathbb{E}\left(Z_{t}\right)\) does not depend on \(t\).
- Recall that \(\mathbb{E}\left(\overline{Z}\right)=\mathbb{E}\left(Z_{t}\right)\) even under dependence. Thus, \(\lim_{n\to\infty}\mathbb{E}\left(\overline{Z}\right)=\mathbb{E}\left(Z_{t}\right)\).
- Next, if we assume that for all \(q\) and \(r\), \(\mathsf{Cov}\left(Z_{q},Z_{r}\right)\) does not depend on \(q\) nor \(r\), then \[\mathsf{Var}\left(\overline{Z}\right) =\frac{1}{n}\left[\mathsf{Var}\left(Z_{t}\right)+2\sum_{j=1}^{n-1}\left(1-\frac{j}{n}\right)\mathsf{Cov}\left(Z_{t},Z_{t-j}\right)\right].\]
\(\mathsf{Cov}\left(Z_{t},Z_{t-j}\right)\) is called an \(j\)th-order autocovariance.

The simplest ergodic theorem, continued

One way for \(\mathsf{Var}\left(\overline{Z}\right)\to0\) as \(n\to\infty\) is when \(\mathsf{Var}\left(Z_{t}\right)\) is bounded and when \[\begin{aligned}\left\vert \sum_{j=1}^{n-1}\left(1-\frac{j}{n}\right)\mathsf{Cov}\left(Z_{t},Z_{t-j}\right)]\right\vert & \leq\sum_{j=1}^{n-1}\left\vert \left(1-\frac{j}{n}\right)\right\vert \left\vert \mathsf{Cov}\left(Z_{t},Z_{t-j}\right)\right\vert \\ & \leq \sum_{j=1}^{n-1}\left\vert \mathsf{Cov}\left(Z_{t},Z_{t-j}\right)\right\vert \end{aligned}\] is bounded as \(n\to\infty\).
Thus, under certain conditions on the autocovariances, \(\overline{Z}\overset{qm}{\to}\mathbb{E}\left(Z_{t}\right)\). Hence, \(\overline{Z}\overset{p}{\to}\mathbb{E}\left(Z_{t}\right)\).
What you saw is the simplest version of the ergodic theorem under stationarity.
It is possible to have a slightly complicated version of this ergodic theorem under nonstationarity, see the very accessible note by Shalizi (2022).

What are the options to control these autocovariances?

Take the approach in Chapter 4.
- Assume that \(\left\{Z_{t}\right\}\) is an IID sequence of random variables with finite second moments.
- This effectively renders \(\mathsf{Var}\left(Z_{t}\right)<\infty\) and \(\mathsf{Cov}\left(Z_{t},Z_{t-j}\right)=0\) for all \(j\neq0\).
Take the approach in Chapter 5.
- Remove the independence assumption, as you only need zero autocovariances.
- Consider processes called martingale difference sequences (MDS). If \(\left\{Z_{t}\right\}\) is an MDS, then \(\mathbb{E}\left(Z_{t}|Z_{t-1},Z_{t-2}\ldots,\right)=0\).
Take the approach in Chapter 6.
- Accept that \(\mathsf{Cov}\left(Z_{t},Z_{t-j}\right)\neq0\) for all \(j\neq0\).
- Consider processes called covariance stationary processes. If \(\left\{ Z_{t}\right\}\) is covariance stationary, then
  - \(\mathbb{E}\left(Z_{t}\right)\) does not depend on \(t\)
  - \(\mathsf{Cov}\left(Z_{t},Z_{t-j}\right)=\gamma\left(j\right)\) is only a function of \(j\) (called the lag order) for all \(t\).

Independence and predictability

One way to look at the processes from the previous slide is to understand how these processes capture predictability.
Recall that if \(\left\{ Z_{t}\right\}\) is an IID sequence, then \(Z_{t}|Z_{t-1},Z_{t-2},\ldots,Z_{1}\sim Z_{t}\).
- This means that knowing the past values of \(Z_{t}\) does not provide any new information.
- In this sense, IID sequences are essentially sequences that are completely unpredictable.
Compare this unpredictability of MDS.
- Note that the expression \(\mathbb{E}\left(Z_{t}|Z_{t-1},Z_{t-2}\ldots,\right)=0\) means that our best prediction of \(Z_{t}\) given all past information is zero.
- That means \(Z_{t}\) is unpredictable in mean only.

The structure of a consistency proof

Consider the following standard argument: \[\begin{eqnarray} \widehat{\beta} &=& \left(\dfrac{1}{n}\sum_{t=1}^{n}X_{t}X_{t}^{\prime}\right)^{-1}\left(\dfrac{1}{n}{\displaystyle \sum_{t=1}^{n}}X_{t}Y_{t}\right) \\ &=& \beta^{*}+\left(\dfrac{1}{n}\sum_{t=1}^{n}X_{t}X_{t}^{\prime}\right)^{-1}\left(\dfrac{1}{n}{\displaystyle \sum_{t=1}^{n}}X_{t}u_{t}\right) \overset{p}{\rightarrow} \beta^{*}+Q^{-1}\mathbb{E}\left(X_{t}u_{t}\right) \end{eqnarray}\]
In the proof, we would need:
- Sampling and model assumptions about \(\left(Y_{t},X_{t}^{\prime}\right)\)
- LLNs for two sequences: \(\left\{ X_{t}X_{t}^{\prime}\right\}\) and \(\left\{ X_{t}u_{t}\right\}\)
- Ensuring that \(Q=\mathbb{E}\left(X_{t}X_{t}^{\prime}\right)\) is nonsingular
- Ensuring that \(\mathbb{E}\left(X_{t}u_{t}\right)=0\).

One path to move beyond IID in the time series case: asymptotic normality

A CLT can also be developed along similar lines. Recall that under consistency of the sample mean for the population mean, we have \(\mathsf{Var}\left(\overline{Z}\right)\to0\) as \(n\to\infty\).
This means that to derive a distributional result, we have to rescale to ensure that the variance does not disappear as \(n\to\infty\), just like before. In particular, \[\mathsf{Var}\left[\sqrt{n} \left( \overline{Z}-\mathbb{E}\left(Z_t\right)\right)\right]=\mathsf{Var}\left(Z_{t}\right)+2\sum_{j=1}^{n-1}\left(1-\frac{j}{n}\right)\mathsf{Cov}\left(Z_{t},Z_{t-j}\right).\] Once again, we need boundedness conditions on the right hand side as \(n\to\infty\).

Path to a CLT and the different forms of \(V\)

So we aim to obtain a CLT that looks like: \[\sqrt{n} \left( \overline{Z}-\mathbb{E}\left(Z_t\right)\right) \overset{d}{\to} N\left(0, V\right),\] where \(V\) is sometimes referred to as the long-run variance.
Again, there are three approaches:
- Chapter 4: Impose IID. Then you get the usual CLT.
- Chapter 5: Impose MDS. This will make all the autocovariance equal to zero. As long as the variance is finite, then we have a CLT for ergodic stationary MDS.
- Chapter 6: Find a way to control these autocovariances but do not force them to be equal to zero.
The form of V?
- Chapters 4 and 5: \(V=\mathsf{Var}\left(Z_t\right)\)
- Chapter 6: \({\displaystyle V=\sum_{j=-\infty}^{\infty}\mathsf{Cov}\left(Z_t,Z_{t-j}\right)}\)

The structure of an asymptotic normality proof

Consider the following standard argument: \[\begin{eqnarray} \sqrt{n}\left(\widehat{\beta}-\beta^{*}\right) = \left(\dfrac{1}{n}\sum_{t=1}^{n}X_{t}X_{t}^{\prime}\right)^{-1}\left(\dfrac{1}{\sqrt{n}}{\displaystyle \sum_{t=1}^{n}}X_{t}u_{t}\right)\\ \overset{d}{\rightarrow} N\bigg(0,\underbrace{Q^{-1}\left[\mathsf{Avar}\left(\dfrac{1}{\sqrt{n}}{\displaystyle \sum_{t=1}^{n}}X_{t}u_{t}\right)\right]Q^{-1}}_{\mathsf{Avar}\left(\sqrt{n}\left(\widehat{\beta}-\beta^{*}\right)\right)}\bigg) \end{eqnarray}\]
In the proof, we would need:
- Sampling and model assumptions about \(\left(Y_{t},X_{t}^{\prime}\right)\)
- LLN for the sequence: \(\left\{ X_{t}X_{t}^{\prime}\right\}\)
- Ensuring that \(Q=\mathbb{E}\left(X_{t}X_{t}^{\prime}\right)\) is nonsingular
- Ensuring that \(\mathbb{E}\left(X_{t}u_{t}\right)=0\)
- CLT for the sequence \(\left\{ X_{t}u_{t}\right\}\)

How do you consistently estimate the asymptotic covariance matrix?

The asymptotic covariance matrix has the “sandwich” form, which allows us to compute the so-called robust standard errors. There are many names for robust standard errors – some call these standard errors the
- Eicker-Huber-White heteroscedasticity-consistent (HC) standard errors
- Newey-West heteroscedasticity and autocorrelation-consistent (HAC) standard errors
There are many others: spatial-robust SEs, clustered SEs, and many more…
In our standard \(\sqrt{n}\) asymptotically normal case, the bottom line is that standard errors for the estimated parameters are obtained by taking the square root of the diagonal entries of a consistently estimated asymptotic covariance matrix and dividing by \(n\).

How do you consistently estimate the asymptotic covariance matrix, continued

We need to consistently estimate the “bread” \(Q=\mathbb{E}\left(X_{t}X_{t}^{\prime}\right)\): This is straightforward.
We also need to consistently estimate the “meat” \[\mathsf{Avar}\left(\dfrac{1}{\sqrt{n}}{\displaystyle \sum_{t=1}^{n}}X_{t}u_{t}\right)=\lim_{n\rightarrow\infty}\mathsf{Var}\left(\dfrac{1}{\sqrt{n}}{\displaystyle \sum_{t=1}^{n}}X_{t}u_{t}\right),\] which is based on
- \(\mathsf{Var}\left(X_{t}u_{t}\right)=\mathbb{E}\left(X_{t}X_{t}^{\prime}u_{t}^{2}\right)\) for the MDS case (see Chapter 5)
- \(\Gamma\left(j\right)=\mathsf{Cov}\left(X_{t}u_{t},X_{t-j}u_{t-j}\right)=\mathbb{E}\left(X_{t}X_{t-j}^{\prime}u_{t}u_{t-j}\right)\) for all \(j\), which is the more general case of serial correlation or dependence (see Chapter 6)

Consistently estimating the “meat” under different assumptions

In the MDS case: \[\dfrac{1}{n}{\displaystyle \sum_{t=1}^{n}}X_{t}X_{t}^{\prime}\widehat{u}_{t}^{2}\overset{p}{\rightarrow}\lim_{n\rightarrow\infty}\mathsf{Var}\left(\dfrac{1}{\sqrt{n}}{\displaystyle \sum_{t=1}^{n}}X_{t}u_{t}\right)\]
In the non-MDS case:\[\dfrac{1}{n}{\displaystyle \sum_{j=-p_{n}}^{p_{n}}}k\left(\frac{j}{p_{n}}\right)\widehat{\Gamma}\left(j\right)\overset{p}{\rightarrow}\lim_{n\rightarrow\infty}\mathsf{Var}\left(\dfrac{1}{\sqrt{n}}{\displaystyle \sum_{t=1}^{n}}X_{t}u_{t}\right),\] where \(\widehat{\Gamma}\left(j\right)\) consistently estimates \(\Gamma\left(j\right)\), \(k\left(\cdot\right)\) is a user-specified kernel function, and \(p_{n}\) is a user-specified bandwidth.

Understanding the non-MDS case

A naive estimator of the “meat” could have been \[\sum_{j=-(n-1)}^{n-1}\widehat{\Gamma}\left(j\right)=\widehat{\Gamma}\left(0\right)+\sum_{j=1}^{n-1}\left[\widehat{\Gamma}\left(j\right)+\widehat{\Gamma}\left(j\right)^{\prime}\right].\]
In effect, you make the limits of the summation in long-run variance expression finite.
But you have to ask yourself how many observations are used to estimate \(\widehat{\Gamma}\left(j\right)\), say for \(j=0\) and for \(j=n-1\). You will realize that this estimator may be too naive and will be subject to a lot of estimation error.

Understanding the non-MDS case: alternative options

You may have prior information that \(\Gamma\left(j\right)=0\) for all \(j>p\), where \(p\) is known, small-ish, and finite.
- Very unlikely, unless supported by some subject-matter knowledge (Example 6.3, Exercise 6.2)
You may need to “trim” the estimator according to some rule and do some reweighting.
- Use data to decide what \(p_n\) is: Usually this will depend on a negligible fraction of sample size \(n\)
- Downweight calculated quantities that use few observations, i.e. reweight!
- If things are set up appropriately, you get consistent estimation of the asymptotic covariance matrix.

Understanding the non-MDS case: new research

But the past 20-30 years have shown that we might have to be cautious of the previous approach.
- Accept that you cannot consistently estimate the asymptotic covariance matrix. Use fixed-\(b\) asymptotic theory, as opposed to small-\(b\) asymptotic theory, where \(b=p_n/n\). Refer to Kiefer, Vogelsang, and Bunzel (KVB 2000).
- Work of KVB leads to heteroscedasticity and autocorrelation robust (HAR) inference. Refer to other work by Kiefer, Vogelsang, and Sun Yixiao (2013).
The underlying idea is to account in the asymptotic theory for the reality that bandwidths and reweighting schemes are hard to specify in advance. Another idea is to change the reweighting schemes a bit: use a series estimator instead of a kernel estimator.
Default rules in software are available under extra assumptions about user preferences, which sometimes user’s may not even be aware of!

Understanding the non-MDS case: new research

HAR inference is still an ongoing field of research and many are trying to make it useful for practitioners. A very illustrative example is by Lazarus, Lewis, Stock, and Watson (2018) and all the published discussions of the article.
But some old ideas are starting to get resurrected. Model the heteroscedasticity and autocorrelation more directly. The idea is to combine generalized least squares and HC/HAC standard errors.
- Romano and Wolf (2017) use a combination of GLS and HC standard errors to conduct inference.
- Bailie, Diebold, Kapetanios, and Kim (2022) also use a version of GLS, but avoid using HAC standard errors.

Stochastic processes

Recall that random variables are mappings from the sample space \(\Omega\) to \(\mathbb{R}\) (or more generally \(\mathbb{R}^{n}\)).
Stochastic processes are mappings of the form \(Z:T\times\Omega\to\mathbb{R}\) (or more generally mapped to \(\mathbb{R}^{n}\)), where \(T\) is some index set (compare Defintion 5.1 of main textbook).
- The index set need not be time periods as it can have general structures depending on the application.
- A time series \(\{Z\left(t,\cdot\right)\}_{t=1}^{\infty}\) or \(\{Z_{t}\}_{t=1}^{\infty}\) is produced if the index set for the sequence is time.
- A realization or sample path is a sequence of real numbers \(\{Z\left(t,\omega\right)\}_{t=1}^{\infty}\) for some \(\omega\in\Omega\).
Stochastic processes embody a “parallel universes” extension of random sampling to the time series case.
- Just as in random sample case, a thought experiment involved “drawing” from a population. Unlike the random sample case where the thought experiment could be feasible, it is impossible to observe alternative histories.

Examples of SPs

Let \(t\in \mathbb{Z}\).

Process A: Let \(A\) be a random variable with known distribution. Set \(X_{1}=X_{2}=\ldots=A\).
Process B: \(X_t\sim\mathsf{IID}\left(\mu,\sigma^2\right)\)
Process C: \(X_{t}\) is IID Cauchy distributed random variables with location zero and scale one
Process D: \(X_{t}=A_1\cos\left(\theta t+A_2\right)\), where \(\theta\in R\), \(A_1\) and \(A_2\) are independent random variables.
Process E: \(X_t=\left(-1\right)^t A\), where \(A\) is a random variable such that \(\Pr\left(A=1\right)=\Pr\left(A=-1\right)=0.5\).

Examples of SPs, continued

White noise (WN) process: \(X_{t}\) satisfies \(\mathbb{E}\left(X_{t}\right)=0\), \(\mathsf{Var}\left(X_{t}\right)=\sigma^{2}\), and \(\mathsf{Cov}\left(X_{t},X_{t-j}\right)=0\) for all \(j\neq0\).
Moving average (MA) process: \(X_t=Z_t+\theta Z_{t-1}\) where \(Z_t\sim\mathsf{IID}\left(0,\sigma^2\right)\)
Autoregressive (AR) process: \(X_t=\phi X_{t-1}+Z_t\), \(Z_t\sim\mathsf{IID}\left(0,\sigma^2\right)\)
Random walk: Let \(X_t=X_{t-1}+Z_t\), \(Z_t\sim\mathsf{IID}\left(0,\sigma^2\right)\), \(X_0=0\).
Random walk with drift: Let \(X_t=\alpha_0+X_{t-1}+Z_t\), \(Z_t\sim\mathsf{IID}\left(0,\sigma^2\right)\), \(X_0=0\).
Trend stationary process: Let \(X_t=\alpha_0+\alpha_1 t+Z_t\), \(Z_t\sim\mathsf{IID}\left(0,\sigma^2\right)\).

Stationary SPs

An SP \(\left\{ Z_{t}\right\} _{t=1}^{\infty}\) is strictly stationary if, for any given finite integer \(k\) and for any set of subscripts \(t_{1},t_{2},\ldots,t_{m}\), the joint distribution of \(\left(Z_{t_{1}},Z_{t_{2}},\ldots,Z_{t_{m}}\right)\) is the same as the joint distribution of \(\left(Z_{t_{1}+k},Z_{t_{2}+k},\ldots,Z_{t_{m}+k}\right)\).
An SP \(\left\{ Z_{t}\right\}_{t=1}^{\infty}\) is weakly stationary or covariance stationary or second-order stationary if
- \(\mathbb{E}\left(Z_{t}\right)\) does not depend on \(t\)
- \(\mathsf{Cov}\left(Z_{t},Z_{t-j}\right)=\gamma\left(j\right)\) is only a function of \(j\) (called the lag order) for all \(t\).
Compare and contrast these two stationarity concepts.
Determine which of the processes in the examples are strictly stationary, weakly stationary, or neither.

The need for ergodicity

You have already encountered one version of the concept of ergodicity. It is typically called mean ergodicity (or ergodic for the mean).
Definition 5.9 in the main textbook defines ergodicity as follows: A strictly stationary process \(\{Z_t\}\) is ergodic if for any two bounded functions \(f:\mathbb{R}^{k+1}\to\mathbb{R}\) and \(g:\mathbb{R}^{l+1}\to\mathbb{R}\), \[\begin{eqnarray}\lim_{m\to\infty}\left| \mathbb{E}\left(f(Z_t,\ldots,Z_{t+k})g(Z_{m+t},\ldots,Z_{m+t+l})\right)\right|\\=\left|\mathbb{E}\left(f(Z_t,\ldots,Z_{t+k})\right)\right|\cdot\left|\mathbb{E}\left(g(Z_{m+t},\ldots,Z_{m+t+l})\right)\right|.\end{eqnarray}\]
There is another definition for ergodicity that we do not cover. Ironically, it is actually the more important concept.
But for practical purposes and for the time series processes we consider, it is easier to really think about ergodic for the mean because we are using the concept to allow us to invoke the law of large numbers in dependent settings.
From the main textbook: “An important implication of ergodicity is that the statistical properties (such as the population mean and variance) of the ergodic time series process can be deduced from a single, sufficiently long sample (realization) of the process.”

The need for ergodicity, continued

Consider Process A.
- This process is strictly stationary. If you assume finite second moments, the process is also weakly stationary.
- Consider the sample mean \(\overline{X}\). Here \(\overline{X}\) is always equal to \(A\) is a random variable.
- Compare this with the behavior of the sample mean of an IID sequence. The LLN for IID sequences state that the sample mean behaves as if it were a constant. In this example, we have a random limit.
Ergodicity in the sense of Definition 5.9 does not necessarily imply asymptotically independence. Consider Process E.
- Hassler (2019) uses this process as a counterexample to the claim that “ergodicity is a notion of asymptotic independence”.
Showing ergodicity is not expected of you. In fact, it is usually assumed in our discussions. But you might need:
- If \(\{Z_t\}\) is ergodic stationary, then \(\{\phi\left(Z_t\right)\}\) is ergodic stationary.
- If \(\{Z_t\}\) is ergodic stationary, then \(\{\phi\left(Z_t, Z_{t-1},\ldots\right)\}\) is ergodic stationary.

Martingales

Martingales embody the idea of “no anticipated changes” given all past information. The efficient markets hypothesis and the consumption smoothing hypothesis are statements about economic quantities that behave like martingales.
By definition, \(\left\{ Z_{t}\right\}\) is a martingale if \(\mathbb{E}\left(Z_{t}|Z_{t-1},Z_{t-2},\ldots\right)=Z_{t-1}\).
- Let \(I_{t-1}\) be the \(\sigma\)-field generated by \(\{Z_{t-1},Z_{t-2},\ldots\}\). \(I_{t-1}\) is sometimes called an information set.
- A consequence of this definition is that our best prediction of \(Z_{t}\) given all available past information is its most recent value.
- Furthermore, we can also conclude that if \(\{Z_{t}\}\) is a martingale then \(\mathbb{E}\left(Z_{t+j}|Z_{t-1},Z_{t-2},\ldots\right)=Z_{t-1}\) for all \(j\geq0\).

Martingale differences

So, where does the idea of “no anticipated changes” show up in the definition of a martingale?
Since best prediction of \(Z_{t}\) given all available past information is its most recent value, then the CEF \(\mathbb{E}\left(Z_t|I_{t-1}\right)\) is equal to the best linear predictor \(\beta_0^*+\beta_1^* Z_{t-1}+\beta_2^* Z_{t-2} + \cdots\), where \(\beta_0^*=0\), \(\beta_1^*=1\), \(\beta_j^*=0\) for all \(j\geq 2\).
Another way to see “no anticipated changes” is through the definition of a martingale difference sequence (MDS).
Note that \(Z_t=\beta_0^*+\beta_1^* Z_{t-1}+\beta_2^* Z_{t-2} + \cdots+\varepsilon_t\) where \(\mathbb{E}\left(\varepsilon_t|I_{t-1}\right)=0\).
The CEF error \(\varepsilon_t\) is actually an MDS.
Clearly, an MDS does not have serial correlation.

IID vs MDS vs WN

IID sequences with zero mean are stationary martingale difference sequences.
Stationary martingale difference sequences with finite second moments are WN processes.
You have already seen the shades of unpredictability of IID and MDS.
- In Example 5.10 and 5.14 of the main textbook, you find an first-order autoregressive conditional heteroscedastic (ARCH(1)) process \(Z_{t}=h_{t}^{1/2}\varepsilon_{t}\), where \(h_{t}=\alpha_{0}+\alpha_{1}Z_{t-1}^{2}\), \(\varepsilon_{t}\overset{\mathsf{IID}}{\sim}\left(0,1\right)\), and \(\alpha_{0}\), \(\alpha_{1}>0\). This is an MDS but not a zero mean IID process.
What about WN?
- In Example 5.15 of the main textbook, you will find a nonlinear MA process \(\varepsilon_{t}=\alpha Z_{t-1}Z_{t-2}+Z_{t}\), where \(Z_{t}\overset{\mathsf{IID}}{\sim}\left(0,1\right)\). This is a WN process but is not an MDS.

Why is talking about MDS important?

Why talk about MDS rather than jump directly to zero-mean covariance stationary processes? Two reasons:
- MDS are processes that show up a lot in economics and finance.
- MDS embody the idea of correct specification but in a dynamic context. Some refer to this as dynamic completeness.
In the asymptotic normality proof for linear regression models, you will encounter a point where you need a CLT for the sequence \(\left\{ X_{t}u_{t}\right\}\).
- If you assume that \(\left\{ X_{t}u_{t}\right\}\) is MDS, then \[\mathbb{E}\left(X_tu_t|X_{t-1}u_{t-1},X_{t-2}u_{t-2}\ldots,\right)=0.\]
- But this implies that \(\mathsf{Cov}\left(X_tu_t,X_{t-1}u_{t-1}\right)=0\), \(\mathsf{Cov}\left(X_tu_t,X_{t-2}u_{t-2}\right)=0\), and so on.

Why is talking about MDS important, continued

Observe that \[\begin{eqnarray}\mathsf{Cov}\left(X_tu_t,X_{t-1}u_{t-1}\right) &=& \mathbb{E}\left(X_tu_tX_{t-1}u_{t-1}\right) \\ &=&\mathbb{E}\left(\mathbb{E}\left(X_tu_tX_{t-1}^\prime u_{t-1}|X_t, X_{t-1}, u_{t-1}\right)\right) \\ &=& \mathbb{E}\left(X_tX_{t-1}^\prime u_{t-1}\mathbb{E}\left(u_t|X_t, X_{t-1}, u_{t-1}\right)\right) \end{eqnarray}\]
- One way to make \(\mathsf{Cov}\left(X_tu_t,X_{t-1}u_{t-1}\right)=0\) is to have \(\mathbb{E}\left(u_t|X_t, X_{t-1}, u_{t-1}\right)=0\).
- Repeat the argument for \(\mathsf{Cov}\left(X_tu_t,X_{t-2}u_{t-2}\right)\).
- One way to make \(\mathsf{Cov}\left(X_tu_t,X_{t-2}u_{t-2}\right)=0\) is to have \(\mathbb{E}\left(u_t|X_t, X_{t-1}, X_{t-2},u_{t-1},u_{t-2}\right)=0\).
Continuing this over the entire past of \(\{X_tu_t\}\), we will have no choice but to state that \[\mathbb{E}\left(u_t|X_t, X_{t-1}, X_{t-2},\ldots, u_{t-1},u_{t-2}\ldots,\right)=0.\]

Why is talking about MDS important, continued

Essentially this means that \(u_t\) is a “shock” given its past and the current and past values of \(X_t\). This shock interpretation is much more than the usual correct specification. In effect, we want \(X_t^\prime\beta^*\) to have correct dynamic specification. So, the \(\beta^*\) is really \(\beta^o\).
Technically, you do not need \(\left\{ X_{t}u_{t}\right\}\) to be MDS for \(\widehat{\beta}\) to be consistent for \(\beta^*\).
But if you impose \(\left\{ X_{t}u_{t}\right\}\) to be MDS and apply the CLT for ergodic stationary MDS, you are, in effect, working with \(\beta^o\) because of correct dynamic specification. So the \(u_t\) really becomes \(\varepsilon_t\).
All your inferences will be about \(\beta^o\). So the asymptotic normality result becomes \[ \sqrt{n}\left(\widehat{\beta}-\beta^o\right)\overset{d}{\to} N\left(0,Q^{-1}VQ^{-1}\right), \] where \(V=\mathsf{Var}\left(X_t\varepsilon_t\right)\).
Without the MDS assumption, we have to take care of the autocovariances. At some point, we do have to move on from MDS.

Special case: AR(1) model

Let \(Y_t=\beta_1 Y_{t-1}+\varepsilon_t\). Suppose that Assumptions 5.1 to 5.5 hold.
- Show that \(\widehat{\beta}_1\) is consistent for \(\beta_1\).
- Derive the asymptotic distribution \(\sqrt{n}\left(\widehat{\beta}_1-\beta_1\right)\).
- Repeat the previous exercise under conditional homoscedasticity.
- What happens to the asymptotic distribution when \(\beta_1=1\)?
What will happen when Assumption 5.5 does not hold?

Exercises

Suppose you have \(Y_t=X_t^\prime \beta^o+\varepsilon_t\). Assume that \(\{\left(Y_t,X_t^\prime\right)\}\) are realizations from an ergodic stationary process. Assume that the relevant moments exist.

(A version of Exercise 5.15) Further assume that \(\mathbb{E}\left(\varepsilon_t|X_t\right)=0\).
- Is \(\{\varepsilon_t\}\) white noise?
- If, in addition, \(\{\varepsilon_t\}\) is MDS, would it be compatible with the assumptions made at the beginning?
- Suppose there is evidence that \(\{\varepsilon_t\}\) is serially correlated. Will this evidence make you doubt the assumptions made at the beginning?
(A version of Exercise 5.16) Now, assume that \(\{X_t\}\) and \(\{\varepsilon_t\}\) are independent.
- “\(\{X_t\}\) and \(\{\varepsilon_t\}\) are independent” – is this stronger or weaker than \(\mathbb{E}\left(\varepsilon_t|X_t\right)=0\)?
- Will \(\widehat{\beta}\) be consistent for \(\beta^o\) when
  - \(\{\varepsilon_t\}\) is MDS
  - there is evidence that \(\{\varepsilon_t\}\) is serially correlated
- Derive the asymptotic variance of \(\sqrt{n}\left(\widehat{\beta}-\beta^o\right)\) when
  - \(\{\varepsilon_t\}\) is MDS
  - there is evidence that \(\{\varepsilon_t\}\) is serially correlated

Exercises, continued

(A version of Exercise 5.22, 5.24) Now, assume that \(\{\varepsilon_t\}\) follows a particular ARCH(1) process of the form: \(\varepsilon_t=\sigma_tz_t\), \(\sigma^2_t=\alpha_0+\alpha_1 \varepsilon_{t-1}^2\) where \(\alpha_0>0\), \(0<\alpha_1<1\), \(z_t \overset{\mathsf{IID}}{\sim} N\left(0,1\right)\), \(\{X_t\}\) and \(\{z_t\}\) are independent.
- Is \(\{\varepsilon_t\}\) MDS? Is \(\{\varepsilon_t\}\) white noise?
- Will \(\widehat{\beta}\) be consistent for \(\beta^o\)?
- Derive the asymptotic variance of \(\sqrt{n}\left(\widehat{\beta}-\beta^o\right)\).
(A version of Exercise 5.14, 5.25) Set \(X_t=\left(1,Y_{t-1}\right)^\prime\).
- Determine if \(\widehat{\beta}\) is consistent for \(\beta^o\) when:
  - \(\varepsilon_t=\rho\varepsilon_{t-1}+v_t\) where \(v_t\overset{\mathsf{IID}}{\sim}\left(0,\sigma^2\right)\)
  - \(\mathbb{E}\left(\varepsilon_t|X_t\right)=0\) and \(\mathbb{E}\left(\varepsilon_t^2|\varepsilon_{t-1},\varepsilon_{t-2},\ldots\right)=\alpha_0+\alpha_1 Y_{t-1}^2\)
- Is Assumption 5.4 satisfied in the previous two cases?

CLT for zero mean ergodic stationary SPs: setup

We can further improve the CLT for ergodic stationary MDS to allow for some serial correlation. But as you have seen in our discussion of the asymptotic behavior of the sample mean, we need to control the behavior of certain autocovariances.
Here is the setup:
- Suppose \(\left\{ Z_{t}\right\}\) is a zero mean vector ergodic stationary process with autocovariances \(\Gamma\left(j\right)=\mathsf{Cov}\left(Z_{t},Z_{t-j}\right)=\mathbb{E}\left(Z_{t}Z_{t-j}^{\prime}\right)\).
- Let \(I_{t-j}\) be the set containing information about past values of \(Z_{t}\), i.e., \(I_{t-j}=\left\{ Z_{t-j},Z_{t-j-1},\ldots\right\}.\)
- Let \[r_{t,j}=\mathbb{E}\left(Z_{t}|I_{t-j}\right)-\mathbb{E}\left(Z_{t}|I_{t-j-1}\right)\] be a revision of expectations given the arrival of information.
- Note that in the textbook \(r_{t,j}\) is written only as \(r_j\). It will be clear later on why.

CLT for zero mean ergodic stationary SPs: first idea

The first component of the idea: Note that \[\begin{aligned}Z_{t} & =Z_{t}-\left[\mathbb{E}\left(Z_{t}|I_{t-1}\right)-\mathbb{E}\left(Z_{t}|I_{t-1}\right)\right]-\ldots-\left[\mathbb{E}\left(Z_{t}|I_{t-j}\right)-\mathbb{E}\left(Z_{t}|I_{t-j}\right)\right]\\ & =\left[Z_{t}-\mathbb{E}\left(Z_{t}|I_{t-1}\right)\right]+\left[\mathbb{E}\left(Z_{t}|I_{t-1}\right)-\mathbb{E}\left(Z_{t}|I_{t-2}\right)\right]+\left[\mathbb{E}\left(Z_{t}|I_{t-2}\right)-\mathbb{E}\left(Z_{t}|I_{t-3}\right)\right]\\ & \ \ \ \ \ +\ldots+ \left[\mathbb{E}\left(Z_{t}|I_{t-j+1}\right)-\mathbb{E}\left(Z_{t}|I_{t-j}\right)\right]+\mathbb{E}\left(Z_{t}|I_{t-j}\right)\\ & =\left[\mathbb{E}\left(Z_{t}|I_{t}\right)-\mathbb{E}\left(Z_{t}|I_{t-1}\right)\right]+\left[\mathbb{E}\left(Z_{t}|I_{t-1}\right)-\mathbb{E}\left(Z_{t}|I_{t-2}\right)\right]\\ & \ \ \ \ \ +\left[\mathbb{E}\left(Z_{t}|I_{t-2}\right)-\mathbb{E}\left(Z_{t}|I_{t-3}\right)\right]+\ldots+\left[\mathbb{E}\left(Z_{t}|I_{t-j+1}\right)-\mathbb{E}\left(Z_{t}|I_{t-j}\right)\right]+\mathbb{E}\left(Z_{t}|I_{t-j}\right)\\ & =r_{t,0}+r_{t,1}+\ldots+r_{t,j-1}+\mathbb{E}\left(Z_{t}|I_{t-j}\right). \end{aligned}\]

CLT for zero mean ergodic stationary SPs: second idea

The second component of the idea: We need to impose the condition \(\mathbb{E}\left(Z_{t}|I_{t-j}\right)\overset{qm}{\rightarrow}0\) as \(j\rightarrow\infty\).
- As \(j\rightarrow\infty\), the information set \(I_{t-j}\) “gets smaller”.
- Intuitively, this means that you have less and less information to make a prediction about \(Z_{t}\).
- As \(j\to\infty\), \(Z_{t}\) can be expressed in the form \[Z_{t}=\sum_{j=0}^{\infty}r_{t,j}.\]
- In addition, \(r_{t,j}\) has some very nice properties.
  - For fixed \(j\), \(\mathbb{E}\left(r_{t,j}|I_{t-j-1}\right)=0\). In other words, \(\left\{ r_{tj}\right\}\) is an MDS with respect to the information set \(I_{t-j}\).
  - For all \(i<j\), \(\mathsf{Cov}\left(r_{t,i},r_{t,j}\right)=0\).
Therefore, \(Z_{t}\) is really an infinite sum of martingale differences.

CLT for zero mean ergodic stationary SPs: last idea

To ensure that the asymptotic variance of \(\sqrt{n}\ \overline{Z}\) is finite, we need to make sure that \[\mathsf{Var}\left(\sqrt{n}\ \overline{Z}_{n}\right)=\mathsf{Var}\left(Z_{t}\right)+2\sum_{j=1}^{n-1}\left(1-\frac{j}{n}\right)\mathsf{Cov}\left(Z_{t},Z_{t-j}\right)\] is finite.
The last condition \[{\displaystyle \sum_{j=0}^{\infty}\left[\mathbb{E}\left(r_{t,j}^{\prime}r_{t,j}\right)\right]^{1/2}<\infty}\] guarantees this.

CLT for zero mean ergodic stationary SPs: the statement

Suppose the following conditions hold:
- \(\mathbb{E}\left(Z_{t}\right)=0\)
- The long-run variance \({\displaystyle V=\sum_{j=-\infty}^{\infty}\Gamma\left(j\right)}\) is positive definite.
- As \(j\to\infty\), we have \(\mathbb{E}\left(Z_{t}|I_{t-j}\right)\overset{qm}{\rightarrow}0\).
- \({\displaystyle \sum_{j=0}^{\infty}\left[\mathbb{E}\left(r_{t,j}^{\prime}r_{t,j}\right)\right]^{1/2}<\infty}\).
Then, as \(n\to\infty\), \[\sqrt{n}\left(\frac{1}{n}\sum_{t=1}^{n}Z_{t}\right)\overset{d}{\rightarrow}N\left(0,V\right).\]

Exercises in the textbook

My suggestion is to focus on the following exercises:

4.1, 4.11, 4.12, 4.13 (1) to (4) only: to practice writing consistency and asymptotic normality proofs
5.1 to 5.8, 5.21: really about the different time series processes and their characteristics and doing some calculations
5.14 to 5.16, 5.22 to 5.25: versions of these exercises are available in the slides
6.1: long-run variance expression
6.2 and 6.6: an example where serial correlation cuts off at some point

Large-sample theory for linear regressions applied to time series data

Issues with regressions involving time series data

Motivation: Moving beyond IID

Motivation: Is OLS working?

Simulated data from \(Y_t=u_t\), \(n=500\) observations

Simulated data from \(Y_t=0.5Y_{t-1}+u_t\), \(n=500\) observations

Simulated data from \(Y_t=0.95Y_{t-1}+u_t\), \(n=500\) observations

Simulated data from \(Y_t=Y_{t-1}+u_t\), \(n=500\) observations

Design of Monte Carlo

Monte Carlo simulation: code

Monte Carlo simulation: results for \(n=40\)

Monte Carlo simulation: results for \(n=160, 640\)

A curiosity

Sampling distribution of \(t\)-statistic under the null of a unit root

Spurious regressions: setup

Spurious regressions: OLS results, first case

Spurious regressions: OLS results, second case

Spurious regressions: explanation

Spurious regressions: a solution

A path beyond IID in the time series case

The simplest ergodic theorem

The simplest ergodic theorem, continued

What are the options to control these autocovariances?

Independence and predictability

The structure of a consistency proof

One path to move beyond IID in the time series case: asymptotic normality

Path to a CLT and the different forms of \(V\)

The structure of an asymptotic normality proof

How do you consistently estimate the asymptotic covariance matrix?

How do you consistently estimate the asymptotic covariance matrix, continued

Consistently estimating the “meat” under different assumptions

Understanding the non-MDS case

Understanding the non-MDS case: alternative options

Understanding the non-MDS case: new research

Understanding the non-MDS case: new research

Stochastic processes

Examples of SPs

Examples of SPs, continued

Stationary SPs

The need for ergodicity

The need for ergodicity, continued

Martingales

Martingale differences

IID vs MDS vs WN

Why is talking about MDS important?

Why is talking about MDS important, continued

Why is talking about MDS important, continued

Special case: AR(1) model

Exercises

Exercises, continued

CLT for zero mean ergodic stationary SPs: setup

CLT for zero mean ergodic stationary SPs: first idea

CLT for zero mean ergodic stationary SPs: second idea

CLT for zero mean ergodic stationary SPs: last idea

CLT for zero mean ergodic stationary SPs: the statement

Exercises in the textbook