Andrew Pua
February 2022
The course mainly serves two purposes:
Research: Not just for the upcoming thesis writing phase; be a good consumer and producer
Certification: Part of your core courses and so that your degree has some useful information about your skill level.
Can it be good for business decision-making?
Keep an open mind. The course is not just mathematics.
Pay attention or things will pass you by. You should not multi-task.
Ask questions immediately and participate in class.
Memorization can help, but doing bits of memorization over an entire semester is better than doing it a day (or less!) before the exams.
Do the exercises immediately, even if not told and even if there are no solutions.
As much as possible, I follow the main textbook in terms of its overall structure, including the notation. But I may jump from one place to another.
I jump from one place to another with a purpose in mind. I put references to the main textbook in the slides.
I want to give you the context and the connections to past knowledge rather than just the methods/computer commands. I want you to be able to rebuild this knowledge if you lose it.
Almost no homework to submit, but there are activities that are graded for completion.
Write down the expression needed to calculate \(\mathbb{E}\left(X^{2}\right)\).
One of the four words in the sentence “I SEE THE MOUSE” will be selected at random.
The task is to predict the number of letters in the selected word, denote this as \(Y\).
What would be your prediction rule in order to make your expected loss as small as possible?
How much is the smallest expected loss?
What if you have other information in the form of another random variable \(X_{1}\)?
Consider a prediction rule of the form \(\beta_{0}+\beta_{1}X_{1}\).
As a result, we have \[\beta_{0}^{*} = \mathbb{E}\left(Y\right)-\beta_{1}^{*}\mathbb{E}\left(X_{1}\right),\qquad\beta_{1}^{*}=\dfrac{\mathsf{Cov}\left(X_{1},Y\right)}{\mathsf{Var}\left(X_{1}\right)}.\]
\(\mathsf{Var}\left(X_{1}\right)>0\) rules out point-mass distributions.
What is the minimized value of the objective function?
What happens when \(\beta_1^*=0\) is known in advance? \(\beta_0^*=0\)
Return to I SEE THE MOUSE. The next task is to predict the number of letters in the selected word if you have information about the number of E’s in the word (call this \(X_1\)).
There is a common structure to the optimization problems you have seen so far.
Most optimization problems in econometrics will be sharing this common structure and it is connected to orthogonal projection, a linear algebra concept you may have encountered before.
Define the inner product to be \[\left\langle Y,X_{1}\right\rangle =\mathbb{E}\left(X_{1}Y\right).\]
Therefore, we could rewrite the problems as \[\min_{\beta_{0}}\left\Vert Y-\beta_{0}\right\Vert ^{2},\qquad\min_{\beta_{0},\beta_{1}}\left\Vert Y-\beta_{0}-\beta_{1}X_{1}\right\Vert ^{2}.\]
Yet another way to interpret the first-order conditions is by looking at them as systems of linear equations. \[\underbrace{\left(\begin{array}{cc} 1 & \mathbb{E}\left(X_1\right)\\ \mathbb{E}\left(X_1\right) & \mathbb{E}\left(X^{2}_1\right) \end{array}\right)}_Q\left(\begin{array}{c} \beta_{0}^{*}\\ \beta_{1}^{*} \end{array}\right) = \left(\begin{array}{c} \mathbb{E}\left(Y\right)\\ \mathbb{E}\left(X_1Y\right) \end{array}\right)\]
The matrix \(Q\) is important and you will see this frequently.
set.seed(20220221) # Change this to generate different results
coefs <- matrix(NA, nrow=10^4, ncol=2) # Storage
for(i in 1:10^4)
{
source <- matrix(c(1,3,3,5,0,2,1,1), ncol = 2) # joint distribution
data <- source[sample(nrow(source), size=40, replace = TRUE),] # IID sampling
temp <- lm(data[, 1] ~ data[, 2]) # least squares
coefs[i, ] <- summary(temp)[[4]][, 1] # store coefficients
}
Plugging in estimated versions of \(\mu_{{1}}\) and \(\mu_{Y}\) has no effect asymptotically: \[\begin{eqnarray}\widehat{\mu}_{11}={\displaystyle \dfrac{1}{n}\sum_{t=1}^{n}}\left(X_{1t}-\overline{X}_1\right)\left(Y_{t}-\overline{Y}\right) \overset{p}{\rightarrow} \mathsf{Cov}\left(X_{1},Y\right)=\mu_{11}\\ \widehat{\mu}_{20}={\displaystyle \dfrac{1}{n}\sum_{t=1}^{n}}\left(X_{1t}-\overline{X}_{1}\right)^{2} \overset{p}{\rightarrow} \mathsf{Var}\left(X_{1}\right)=\mu_{20} \end{eqnarray}\]
You can show that \(\widehat{\beta}_{1}\overset{p}{\rightarrow}\beta_{1}^{*}\) and \(\widehat{\beta}_{0}\overset{p}{\rightarrow}\beta_{0}^{*}\), as \(n\to\infty\).
This is a perfect opportunity to apply the asymptotic tools you have learned before, specifically Lemmas 4.2, 4.6 to 4.9 of the main textbook.
The argument requires IID sampling. This is essentially what you will see in Chapter 4 of the main textbook. Moving beyond IID sampling is the subject of Chapters 5 and 6.
\(\phi^2\) depends on unknown quantities and that the theoretical standard error of \(\widehat{\beta}_{1}\) based on asymptotic theory is given by \[\mathsf{se}\left(\widehat{\beta}_{1}\right)=\dfrac{1}{\sqrt{n}}\sqrt{\dfrac{\mathsf{Var}\left[\left(X_{1t}-\mu_{{1}}\right)u_{t}\right]}{\left[\mathsf{Var}\left(X_{1t}\right)\right]^{2}}.}\]
Contributions from Eicker and Huber from the 1960s and White in the 1980s have shown that it is possible to consistently estimate the standard error of \(\widehat{\beta}_{1}\) as: \[\widehat{\mathsf{se}}\left(\widehat{\beta}_{1}\right)=\sqrt{\dfrac{\displaystyle\sum_{t=1}^{n}\left(X_{1t}-\overline{X}_{1}\right)^{2}\widehat{u}_{t}^{2}}{\left(\displaystyle\sum_{t=1}^{n}\left(X_{1t}-\overline{X}_{1}\right)^{2}\right)^{2}}}.\]This estimate of the standard error is valid for large samples.
set.seed(20220221) # Change this to generate different results
ses <- matrix(NA, nrow=10^4, ncol=2) # Storage
for(i in 1:10^4)
{
source <- matrix(c(1,3,3,5,0,2,1,1), ncol = 2) # joint distribution
data <- source[sample(nrow(source), size=40, replace = TRUE),] # IID sampling
temp <- lm(data[, 1] ~ data[, 2]) # least squares
coefs[i, ] <- summary(temp)[[4]][, 1] # store coefficients
ses[i, ] <- summary(temp)[[4]][,2] # store standard errors
}
c(mean(ses[, 1])/sd(coefs[, 1]), mean(ses[, 2])/sd(coefs[, 2])) # SE/SD ratio
## [1] 1.048279 1.093472
## [1] 1.114528 1.205512
## [1] 1.141887 1.225075