Andrew Pua
March 2022
Use data across countries at a specific point in time. Denote each country by index \(i\).
Assume that \(g\) and \(\delta\) are constant across countries. Depreciation rates do not vary greatly across countries. (Justifications found in page 410)
Assume that \(\log A\left(0\right)_{i}=a+\varepsilon_{i}\). Thus, \[\log y_{i}^{*} = a+gt^{*}+\dfrac{\alpha}{1-\alpha}\log s_{i}-\dfrac{\alpha}{1-\alpha}\log\left(n_{i}+g+\delta\right)+\varepsilon_{i}.\]
MRW assume that \(s_{i}\) and \(n_{i}\) are independent of \(\varepsilon_{i}\). (Economic justifications are given in page 411.)
Is there an econometric justification for imposing independence?
Cross-sectional dataset
Variables:
Three samples of countries: 98 non-oil countries, 75 countries with better quality data (intermediate sample), 22 OECD countries with populations greater than 1 million
Assume that \(g+\delta=0.05\).
options(digits=3) # Learn to present to the appropriate precision
require(haven) # Need this package to load Stata datasets
## Loading required package: haven
MRW <- read_dta("./MRW.dta") # Load Stata dataset
# Generate new variables
MRW$ly85 <- log(MRW$y85)
MRW$linv <- log(MRW$inv/100)
MRW$lpop <- log(MRW$pop/100 + 0.05)
MRW.TableI.nonoil <- lm(ly85 ~ linv + lpop, data = subset(MRW, MRW$n==1)) # Apply OLS
coef.TableI.nonoil <- coefficients(MRW.TableI.nonoil) # Extract coefficients
coef.TableI.nonoil
## (Intercept) linv lpop
## 5.43 1.42 -1.99
set.seed(20220312)
n <- 1000
true_ability <- rnorm(n, 50, 10)
noise_1 <- rnorm(n, 0, 10)
noise_2 <- rnorm(n, 0, 10)
midterm <- true_ability + noise_1
final <- true_ability + noise_2
lm(final ~ midterm)
##
## Call:
## lm(formula = final ~ midterm)
##
## Coefficients:
## (Intercept) midterm
## 22.666 0.541
Let \(Y_{new}=c_1+c_2 Y\) and \(X_{1,new}=d_1+d_2 X_1\) (scalar). Let \(c_1\), \(c_2\neq 0\), \(d_1\), \(d_2\neq 0\) be constants.
You can always write \[Y_{new}=\beta_{0,new}^*+\beta_{1,new}^*X_{1,new}+u_{new}.\] But how are the new coefficients related to the original coefficients?
Start with \[\begin{aligned} Y &=\beta_{0}^*+\beta_{1}^*X_1+u \\ \frac{Y_{new}-c_1}{c_2}&= \beta_{0}^*+\beta_{1}^*\left(\frac{X_{1,new}-d_1}{d_2}\right)+u \\ Y_{new} &= \left(c_2\beta_0^*+c_1-c_2\frac{\beta_1^*d_1}{d_2}\right)+c_2\frac{\beta_1^*}{d_2}X_{1,new}+c_2 u.\end{aligned}\]
Now let us look at interaction terms and what they are useful for.
Let \(Y=\beta_0^*+\beta_1^* X_1+\beta_2^* X_2+\beta_3^* X_1X_2+u\). Assume \(X_1\) is a binary/dummy variable and \(X_2\) is continuous.
Consider the following comparisons where we look at the best linear prediction of \(Y\) given:
The last comparison is sometimes used to make causal statements in a differences-in-differences framework. Note that at the moment, we have a predictive comparison only.
How about powers like squares or cubes?
Let \(Y=\beta_0^*+\beta_1^* X_1+\beta_2^* X_1^2+u\). Assume \(X_1\) is continuous.
Compare the best linear prediction of \(Y\) when \(X_1=x+\Delta x\) against the best linear prediction of \(Y\) when \(X_1=x\). Then the difference is \[\beta_1^* \Delta x + \beta_2^* \left(2x\Delta x+ \left(\Delta x\right)^2\right).\]
On a per unit difference basis, we can write \[\begin{aligned}\dfrac{\beta_1^* \Delta x + \beta_2^* \left(2x\Delta x+ \left(\Delta x\right)^2\right)}{\Delta x} \overset{\Delta x\to 0}{\approx} = \beta_1^*+2\beta_2^*x.\end{aligned}\]
In the model with interactions or with powers, it might be a good idea to do some centering.
Consider once again \(Y=\beta_0^*+\beta_1^* X_1+\beta_2^* X_1^2+u\). Assume \(X_1\) is continuous. Define \(X_{1,new}=X_1-\mathbb{E}\left(X_1\right)\).
You can write \(Y=\beta_0^*+\beta_1^* X_1+\beta_2^* X_1^2+u\) as \[Y=\underbrace{\beta_0^*+\beta_1^* \mathbb{E}\left(X_1\right)+ \beta_2^* \left(\mathbb{E}\left(X_1\right)\right)^2}_{\beta_{0,new}^*} + \underbrace{\left(\beta_1^*+2\beta_2^*\mathbb{E}\left(X_1\right)\right)}_{\beta_{1,new}^*} X_{1,new} +\underbrace{\beta_2^*}_{\beta_{2,new}^*} X_{1,new}^2+u\]
What do you notice about \(\beta_{2,new}^*\)?
Try conducting a similar analysis if you have interaction terms.
Setup: \[\underset{\left(n\times1\right)}{Y}=\left(\begin{array}{c} Y_{1}\\ \vdots\\ Y_{n} \end{array}\right),\underset{\left(n\times\left(k+1\right)\right)}{\boldsymbol{\mathrm{X}}}=\left(\begin{array}{c} X_{1}^{\prime}\\ \vdots\\ X_{n}^{\prime} \end{array}\right)=\left(\begin{array}{cccc} 1 & {X_{11}} & \cdots & {X_{k1}}\\ \vdots & \vdots & \ddots & \vdots\\ 1 & {X_{1n}} & \cdots & {X_{kn}} \end{array}\right),\underset{\left(\left(k+1\right)\times1\right)}{\beta}=\left(\begin{array}{c} \beta_{0}\\ \vdots\\ \beta_{k} \end{array}\right)\]
Least squares objective function: \[\begin{eqnarray}\min_{\beta}\dfrac{1}{n}\left(Y-\boldsymbol{\mathrm{X}}\beta\right)^{\prime}\left(Y-\boldsymbol{\mathrm{X}}\beta\right) &=& \min_{\beta}\dfrac{1}{n}\sum_{t=1}^{n}\left(Y_{t}-X_{t}^{\prime}\beta\right)^{2} \\ &=& \min_{\beta}\dfrac{1}{n}\sum_{t=1}^{n}\left(Y_{t}-\beta_{0}-\beta_{1}X_{1t}-\cdots-\beta_{k}X_{kt}\right)^{2}\end{eqnarray}\]
Solution to minimization problem or least squares minimizer or OLS estimator: minimizer:\[\widehat{\beta}=\left(\boldsymbol{\mathrm{X}}^{\prime}\boldsymbol{\mathrm{X}}\right)^{-1}\left(\boldsymbol{\mathrm{X}}^{\prime}Y\right)=\left({\displaystyle \frac{1}{n}}{\displaystyle \sum_{t=1}^{n}}X_{t}X_{t}^{\prime}\right)^{-1}\left({\displaystyle \frac{1}{n}}{\displaystyle \sum_{t=1}^{n}}X_{t}Y_{t}\right)\]
To obtain the LS minimizer, use calculus or matrix calculus, depending on your level of comfort. But avoid this, if possible. Use orthogonality!
The system of equations or the first-order conditions of the minimization problem are given by \[\begin{eqnarray}\dfrac{1}{n}\sum_{t=1}^{n}2\left(Y_{t}-\widehat{\beta}_{0}-\widehat{\beta}_{1}X_{1t}-\cdots-\widehat{\beta}_{k}X_{kt}\right)\left(-1\right) &=& 0 \\ \vdots &=& \vdots\\ \dfrac{1}{n}\sum_{t=1}^{n}2\left(Y_{t}-\widehat{\beta}_{0}-\widehat{\beta}_{1}X_{1t}-\cdots-\widehat{\beta}_{k}X_{kt}\right)\left(-X_{kt}\right) &=& 0\end{eqnarray}\]
See how it is related to orthogonality from before?
SSE is called explained sum of squares (SSE).
Therefore, the centered R-squared, denoted by \(R_{c}^{2}\) or simply \(R^{2}\), is defined as \[R_{c}^{2}=\dfrac{\widehat{\beta}^{\prime}\boldsymbol{\mathrm{X}}^{\prime}M_{\iota}\boldsymbol{\mathrm{X}}\widehat{\beta}}{Y^{\prime}M_{\iota}Y}=\dfrac{\left\Vert M_{\iota}\boldsymbol{\boldsymbol{\mathrm{X}}}\widehat{\beta}\right\Vert ^{2}}{\left\Vert M_{\iota}Y\right\Vert ^{2}}=1-\dfrac{\left\Vert e\right\Vert ^{2}}{\left\Vert M_{\iota}Y\right\Vert ^{2}}\] Do you understand now why it is centered?
Note that \(R^{2}\) increases when you put more columns in \(\mathbf{X}\).
(Stachurski) Here is a series of properties of the R-squared.
Show that the uncentered R-squared is not affected by changing \(Y\) to \(\alpha Y\), where \(\alpha\neq 0\) is a scalar. Start with \[R_{uc,\alpha}^{2}\overset{?}{=}\left\Vert P_{\mathbf{X}}\left(\alpha Y\right)\right\Vert ^{2}/\left\Vert \left(\alpha Y\right)\right\Vert ^{2}\overset{?}{=}\cdots\overset{?}{=}R_{uc}^{2}.\]
Show that the uncentered R-squared is affected by changing \(Y\) to \(Y+\alpha\iota\), where \(\iota\) is a vector of ones and \(\alpha\neq0\) is a scalar. In particular, let \(\mathbf{X}\) contain a column of ones, then we have (justify each step) \[R_{uc,\alpha}^{2}\overset{?}{=}\dfrac{\left\Vert P_{\mathbf{X}}\left(Y+\alpha\iota\right)\right\Vert ^{2}}{\left\Vert Y+\alpha\iota\right\Vert ^{2}}\overset{?}{=}\dfrac{\left\Vert P_{\mathbf{X}}Y+\alpha P_{\mathbf{X}}\iota\right\Vert ^{2}}{\left\Vert Y+\alpha\iota\right\Vert ^{2}}\overset{?}{=}\dfrac{\left\Vert P_{\mathbf{X}}Y+\alpha\iota\right\Vert ^{2}}{\left\Vert Y+\alpha\iota\right\Vert ^{2}}\overset{?}{=}\dfrac{\left\Vert P_{\mathbf{X}}\left(Y/\alpha\right)+\iota\right\Vert ^{2}}{\left\Vert \left(Y/\alpha\right)+\iota\right\Vert ^{2}}.\]
Setup: \(Y\), \(\boldsymbol{\mathrm{X}}\), \(\beta\) same as before, now add \(\underset{\left(J\times\left(k+1\right)\right)}{R}\) and \(\underset{\left(J\times1\right)}{r}\) both nonstochastic, with \(\mathsf{rank}\left(R\right)=J\leq k+1\).
Use when? When there are linear equality constraints on \(\beta\). User must specify \(R\) and \(r\) in advance.
Restricted least squares (RLS) objective function and constraints: \[\min_{\beta}\left(Y-\boldsymbol{\mathrm{X}}\beta\right)^{\prime}\left(Y-\boldsymbol{\mathrm{X}}\beta\right)\ \ s.t.\ R\beta=r\]
Lagrangian: \[\min_{\beta,\lambda}\left(Y-\boldsymbol{\mathrm{X}}\beta\right)^{\prime}\left(Y-\boldsymbol{\mathrm{X}}\beta\right)+2\lambda^{\prime}\left(r-R\beta\right)\]
Show that \[\widetilde{\beta}=\widehat{\beta}-{\left(\boldsymbol{\mathrm{X}}^{\prime}\boldsymbol{\mathrm{X}}\right)^{-1}}R^{\prime}\left[R\left(\boldsymbol{\mathrm{X}}^{\prime}\boldsymbol{\mathrm{X}}\right)^{-1}R^{\prime}\right]^{-1}\left(R\widehat{\beta}-r\right).\]
Intuitively, the SSR for LS should be less than the SSR for RLS, i.e., you should expect that \(\left\Vert e\right\Vert ^{2}\) would be smaller than \(\left\Vert \widetilde{e}\right\Vert ^{2}\).
There are textbook discussions of the linear regression model where \(\boldsymbol{\mathrm{X}}\) are directly treated as constants.
The analysis is very similar to the version where you condition on \(\boldsymbol{\mathrm{X}}\), especially for the finite-sample theory. The notation is likely easier to handle.
However, the asymptotic theory is a bit different.
##
## Call:
## lm(formula = ly85 ~ linv + lpop, data = subset(MRW, MRW$n ==
## 1))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7914 -0.3937 0.0412 0.4337 1.5805
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.430 1.584 3.43 0.00090 ***
## linv 1.424 0.143 9.95 < 2e-16 ***
## lpop -1.990 0.563 -3.53 0.00064 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.689 on 95 degrees of freedom
## Multiple R-squared: 0.601, Adjusted R-squared: 0.592
## F-statistic: 71.5 on 2 and 95 DF, p-value: <2e-16
\[\mathsf{Var}\left(\varepsilon|\mathbf{X}\right)=\begin{pmatrix}\mathsf{Var}\left(\varepsilon_{1}|\mathbf{X}\right) & \mathsf{Cov}\left(\varepsilon_{1},\varepsilon_{2}|\mathbf{X}\right) & \ldots & \mathsf{Cov}\left(\varepsilon_{1},\varepsilon_{n}|\mathbf{X}\right)\\ \mathsf{Cov}\left(\varepsilon_{2},\varepsilon_{1}|\mathbf{X}\right) & \mathsf{Var}\left(\varepsilon_{2}|\mathbf{X}\right) & \ldots & \mathsf{Cov}\left(\varepsilon_{2},\varepsilon_{n}|\mathbf{X}\right)\\ \vdots & \vdots & \ddots & \vdots\\ \mathsf{Cov}\left(\varepsilon_{n},\varepsilon_{1}|\mathbf{X}\right) & \mathsf{Cov}\left(\varepsilon_{n},\varepsilon_{2}|\mathbf{X}\right) & \ldots & \mathsf{Var}\left(\varepsilon_{n}|\mathbf{X}\right) \end{pmatrix}.\]
Assume that the errors are conditionally non-autocorrelated but conditionally heteroscedastic. Show that conditional on \(\mathbf{X}\), \(\left(\boldsymbol{\mathrm{X}}^{\prime}\boldsymbol{\mathrm{X}}\right)^{-1}\boldsymbol{\mathrm{X}}^{\prime}\mathsf{diag}\{\varepsilon^2_1,\ldots, \varepsilon^2_n\} \boldsymbol{\mathrm{X}}\left(\boldsymbol{\mathrm{X}}^{\prime}\boldsymbol{\mathrm{X}}\right)^{-1}\) is an unbiased estimator of \(\left(\boldsymbol{\mathrm{X}}^{\prime}\boldsymbol{\mathrm{X}}\right)^{-1}\boldsymbol{\mathrm{X}}^{\prime}\mathsf{Var}\left(\varepsilon|\boldsymbol{\mathrm{X}}\right)\boldsymbol{\mathrm{X}}\left(\boldsymbol{\mathrm{X}}^{\prime}\boldsymbol{\mathrm{X}}\right)^{-1}\).
Let \(e\) be the residual vector from the least squares fit. Impose Assumptions 3.1 to 3.4. Find \(\mathbb{E}\left(e|\mathbf{X}\right)\) and \(\mathsf{Var}\left(e|\mathbf{X}\right)\).
It might seem strange to find \(\mathsf{Var}\left(e|\mathbf{X}\right)\) under Assumption 3.4. Will the expected value of \(\left(\boldsymbol{\mathrm{X}}^{\prime}\boldsymbol{\mathrm{X}}\right)^{-1}\boldsymbol{\mathrm{X}}^{\prime}\mathsf{diag}\{e^2_1,\ldots, e^2_n\} \boldsymbol{\mathrm{X}}\left(\boldsymbol{\mathrm{X}}^{\prime}\boldsymbol{\mathrm{X}}\right)^{-1}\) be equal to \(\sigma^2\left(\boldsymbol{\mathrm{X}}^{\prime}\boldsymbol{\mathrm{X}}\right)^{-1}\), conditional on \(\mathbf{X}\)?
This is based on Exercises 3.8 and 3.9 of the main textbook.
If there is imperfect multicollinearity, then the variances of the estimators of the individual coefficients are going to be large, leading to imprecision.
This is not a defect, but a correct representation of the reality of empirical practice. Standard errors should reflect the consequences of imperfect multicollinearity.
Note that there is no “fixing” of this imperfect multicollinearity. In fact, there are cases where having imperfect multicollinearity could be a good thing.
It is possible that linear combinations of individual coefficients could be estimated more precisely under imperfect multicollinearity.
Once again, by linearity:\[\left(\begin{array}{c} Y_{1}\\ Y_{2}\\ \vdots\\ Y_{n} \end{array}\right)=\beta_0^o\left(\begin{array}{c} 1\\ 1\\ \vdots\\ 1 \end{array}\right)+\beta_1^o\left(\begin{array}{c} X_{11}\\ X_{12}\\ \vdots\\ X_{1n} \end{array}\right)+\left(\begin{array}{c} \varepsilon_{1}\\ \varepsilon_{2}\\ \vdots\\ \varepsilon_{n} \end{array}\right)\]
Reparameterize to \[\left(\begin{array}{c} Y_{1}\\ Y_{2}\\ \vdots\\ Y_{n} \end{array}\right)=\left(\beta_0^o+\beta_1^o \overline{X}_1\right)\left(\begin{array}{c} 1\\ 1\\ \vdots\\ 1 \end{array}\right)+\beta_1^o\left(\begin{array}{c} X_{11}-\overline{X}_1\\ X_{12}-\overline{X}_1\\ \vdots\\ X_{1n}-\overline{X}_1 \end{array}\right)+\left(\begin{array}{c} \varepsilon_{1}\\ \varepsilon_{2}\\ \vdots\\ \varepsilon_{n} \end{array}\right)\]
Consider the following orthogonal transformation \[A=\left(\begin{array}{cccc} \dfrac{1}{\sqrt{n}} & \dfrac{1}{\sqrt{n}} & \cdots & \dfrac{1}{\sqrt{n}}\\ \dfrac{X_{11}-\overline{X}_{1}}{\sqrt{\sum\left(X_{1t}-\overline{X}_{1}\right)^{2}}} & \dfrac{X_{12}-\overline{X}_{1}}{\sqrt{\sum\left(X_{1t}-\overline{X}_{1}\right)^{2}}} & \cdots & \dfrac{X_{1n}-\overline{X}_{1}}{\sqrt{\sum\left(X_{1t}-\overline{X}_{1}\right)^{2}}}\\ a_{31} & a_{32} & \cdots & a_{3n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{array}\right).\]
This transformation exists and is justified by the Gram-Schmidt orthogonalization procedure (full QR version).
Because \(A\) is an orthogonal matrix, \[\begin{eqnarray}\sum\varepsilon_{t}^{2} &=& \sum\nu_{t}^{2}\\ \sum\left(Y_{t}-\beta_{0}^{o}-\beta_{1}^{o}X_{1t}\right)^{2} &=& \left[\sqrt{n}\overline{Y}-\sqrt{n}\left(\beta_{0}^{o}+\beta_{1}^{o}\overline{X}_{1}\right)\right]^{2}\\ && +\left[\dfrac{\sum\left(X_{1t}-\overline{X}_{1}\right)Y_t}{\sqrt{\sum\left(X_{1t}-\overline{X}_{1}\right)^{2}}}-\beta_{1}^{o}\sqrt{\sum\left(X_{1t}-\overline{X}_{1}\right)^{2}}\right]^{2}\\ && +\nu_{3}^{2}+\cdots+\nu_{n}^{2}\end{eqnarray}\]
Observe that \[\sum\left(Y_{t}-\widehat{\beta}_{0}-\widehat{\beta}_{1}X_{1t}\right)^{2} = \nu_{3}^{2}+\cdots+\nu_{n}^{2}.\] Therefore, the minimized SSR depends on only \(n-2\) remaining pieces of the errors after orthogonal transformation.
The covariance matrix (along with the standard errors) were calculated under conditional homoscedasticity because of MRW imposed it.
Do you think Assumption 3.4 is applicable?
## (Intercept) linv lpop
## (Intercept) 2.5087 0.0983 0.8799
## linv 0.0983 0.0205 0.0229
## lpop 0.8799 0.0229 0.3174
## (Intercept) linv lpop
## 1.584 0.143 0.563
To construct an exact finite-sample \(100\left(1-\alpha\right)\%\) confidence set for \(R\beta^{o}\) under normality, you have to provide \(R\) and then find the \(\left(1-\alpha\right)\) quantile of the \(F_{J,n-\left(k+1\right)}\) , call it \(c_{1-\alpha}\). The confidence set will have the form \[\left(R\widehat{\beta}-R\beta^{o}\right)^{\prime}\left(Rs^{2}\left(\mathbf{X}^{\prime}\mathbf{X}\right)^{-1}R^{\prime}\right)^{-1}\left(R\widehat{\beta}-R\beta^{o}\right)/J\leq c_{1-\alpha}.\]
Geometrically, this confidence set would like a shaded ellipsoid. This shape has implications for constructing joint confidence intervals compared to combining separate confidence intervals together.
To test the hypothesis \(R\beta^{o}=r\) at the \(100\alpha\%\) significance level, you have to provide \(R\) and \(r\) and compute what some call an \(F\)-statistic:\[F=\left(R\widehat{\beta}-r\right)^{\prime}\left(Rs^{2}\left(\mathbf{X}^{\prime}\mathbf{X}\right)^{-1}R^{\prime}\right)^{-1}\left(R\widehat{\beta}-r\right)/J\] and then calculate either a critical value from \(F_{J,n-\left(k+1\right)}\) or a one-sided \(p\)-value \(\Pr\left(F_{J,n-\left(k+1\right)}\geq F^{act}|H_0\right)\), where \(F^{act}\) is the actual value of the statistic computed from the data.
Just like in confidence sets, the test is really a joint test. The results from a joint test are not necessarily the same as testing each of the \(J\) hypotheses one at a time.
The test statistic reduces to a \(t\)-statistic for the special case where \(J=1\).
R.mat <- c(0, 1, 1) # R matrix for testing Solow hypothesis
# Calculate F statistic for testing Solow hypothesis
test.stat <- t(R.mat %*% coef.TableI.nonoil) %*% solve(R.mat %*% cov.TableI.nonoil %*% R.mat) %*%
R.mat %*% coef.TableI.nonoil
# Test statistic, p-value, critical value
c(test.stat, 1-pf(test.stat, 1, 95), qf(0.95, 1, 95))
## [1] 0.834 0.363 3.941
# Generate new variable for restricted regression
MRW$ldiff <- MRW$linv - MRW$lpop
# Apply OLS to restricted regression
MRW.TableI.restricted.nonoil <- lm(ly85 ~ ldiff, data = subset(MRW, MRW$n==1))
# Compute test using SSR comparisons
anova(MRW.TableI.nonoil, MRW.TableI.restricted.nonoil)
## Analysis of Variance Table
##
## Model 1: ly85 ~ linv + lpop
## Model 2: ly85 ~ ldiff
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 95 45.1
## 2 96 45.5 -1 -0.396 0.83 0.36
# Apply OLS to reparameterized model
MRW.TableI.repar.nonoil <- lm(ly85 ~ ldiff + lpop, data = subset(MRW, MRW$n==1))
summary(MRW.TableI.repar.nonoil)[[4]] # Focusing on the summary of the coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.430 1.584 3.428 9.00e-04
## ldiff 1.424 0.143 9.951 2.10e-16
## lpop -0.566 0.619 -0.913 3.63e-01
est.beta1 <- coef(MRW.TableI.restricted.nonoil)[[2]]
implied.alpha <- est.beta1/(1+est.beta1)
implied.alpha
## [1] 0.598
est.var <- c(0, 1/(1+est.beta1)^2) %*% vcov(MRW.TableI.restricted.nonoil) %*% c(0, 1/(1+est.beta1)^2)
delta.method.se <- sqrt(est.var)
delta.method.se
## [,1]
## [1,] 0.0201
Derive the corresponding \(t\)-distribution results for \(J=1\).
Show that \[F=\dfrac{\left(R\widehat{\beta}-r\right)^{\prime}\left(R\left(\mathbf{X}^{\prime}\mathbf{X}\right)^{-1}R^{\prime}\right)^{-1}\left(R\widehat{\beta}-r\right)/J}{s^{2}}=\dfrac{\left(\widetilde{e}^{\prime}\widetilde{e}-e^{\prime}e\right)/J}{e^{\prime}e/\left(n-\left(k+1\right)\right)}.\]
Show that you can also express the previous version of the \(F\)-statistic in terms of R-squared, but again this is of limited use. Do you know why? Why would the SSR version be preferable over the R-squared version of the \(F\)-statistic?
Consider the following data generating process shown in class: \(\left\{ \left(Y_{t},X_{1t}\right)\right\} _{t=1}^{n}\) are IID draws from \[\begin{eqnarray*} Y_{t} &=& -1+2X_{1t}+\varepsilon_{t} \\ X_{1t} &\sim & Bin\left(1,0.3\right) \\ \varepsilon_{t}|X_{1t}=0 &\sim & N\left(0,1\right) \\ \varepsilon_{t}|X_{1t}=1 &\sim & N\left(0,1\right)\end{eqnarray*}\]
Which of Assumptions 3.1, 3.2, 3.4, 3.5 are satisfied?
You should try modifying the code to the case where \(N\left(0,1\right)\) is changed to \(N\left(0,4\right)\). Answer the previous question again and modify the code accordingly.
Try increasing or decreasing \(n\).
What about changing \(Bin\left(1,0.3\right)\) to \(Bin\left(1,0.8\right)\)?
Introduce storage for \(s^2\) and examine its center and distribution.
set.seed(20220312)
n <- 50
# "True" beta values
beta0.o <- -1
beta1.o <- 2
reps <- 10^4
# Storage for OLS estimates (2 entries per replication)
beta.store <- matrix(NA, nrow=reps, ncol=2)
# Storage for robust covariance matrix (4 entries per replication, 2x2 matrix)
rob.store <- matrix(NA, nrow=reps, ncol=4)
# Storage for non-robust covariance matrix (4 entries per replication, 2x2 matrix)
nonrob.store <- matrix(NA, nrow=reps, ncol=4)
# Monte Carlo loop
for (i in 1:reps)
{
X.t <- rbinom(n, 1, 0.3) # Generate X
eps.t <- (rnorm(n, 0, 1))*(X.t == 1)+(rnorm(n, 0, 1))*(X.t == 0) # Generate epsilon
Y.t <- beta0.o + beta1.o*X.t + eps.t # Generate Y
matXX <- t(cbind(1, X.t)) %*% cbind(1, X.t) # X'X matrix
beta.hat <- solve(matXX) %*% (t(cbind(1, X.t)) %*% Y.t) # OLS
# robust cov matrix
bread <- matXX
resid <- Y.t - cbind(1, X.t) %*% beta.hat
meat <- (t(cbind(1, X.t)) %*% diag(c(resid^2))) %*% cbind(1, X.t)
est.rob <- (solve(bread) %*% meat) %*% solve(bread)
s.sq <- 1/(n-2) * sum(resid^2) # estimator for sigma2 under cond. homoscedasticity
est.nonrob <- s.sq * solve(matXX) # nonrobust cov matrix
beta.store[i,] <- c(beta.hat)
rob.store[i,] <- c(est.rob)
nonrob.store[i,] <- c(est.nonrob)
}
## [1] -1 2
## [1] 0.172 0.313
## [1] 0.166 0.301
## [1] 0.169 0.313
# Test statistic for null beta0 = -1 using nonrobust cov matrix
t.ratios.beta0 <- (beta.store[,1]-beta0.o)/sqrt(nonrob.store[,1])
# Test statistic for null beta0 = -1 using robust cov matrix
t.ratios.beta0.rob <- (beta.store[,1]-beta0.o)/sqrt(rob.store[,1])
# Test statistic for null beta1 = 2 using nonrobust cov matrix
t.ratios.beta1 <- (beta.store[,2]-beta1.o)/sqrt(nonrob.store[,4])
# Test statistic for null beta1 = 2 using robust cov matrix
t.ratios.beta1.rob <- (beta.store[,2]-beta1.o)/sqrt(rob.store[,4])
# Empirical rejection rate alpha = 0.05
mean(abs(t.ratios.beta0)>qnorm(0.975))
## [1] 0.0592
## [1] 0.0542
## [1] 0.0653
## [1] 0.0677