When you have multiple variables (say, the 5 oceans), and you only include N-1 of the variables in your table, what does that mean for the left out variable (assume you left out, say, the Atlantic ocean)

The estimates for the oceans should be interpreted as deviations in the intercept value relative to the omitted variable (Atlantic).

What if there are no regressors (say, you include all 5 oceans) in a regression? What happens if all are included?

If all are included, there would be perfect multicollinearity. One of the other variables, or the intercept, would have to be dropped?

A researcher plans to study the causal effect of police on crime using data from a random sample of U.S. counties. He plans to regress the county’s crime rates on the (per capita) size of the county’s police force.

A) Explain why thie regression is likely to suffer from omitted variable bias. Which variables would you add to the regression to control for important omitted variables?

A) Explain why thie regression is likely to suffer from omitted variable bias. Which variables would you add to the regression to control for important omitted variables?

A regression of crime on police force will suffer from omitted variable bias because there are many determinants of crime at the county level – economic, demographic and others – and some of these are bound to be related to decisions about the size of the force. For example, it might be that crime is influenced not by the number of police, but how effective they are at their job. A variable like the arrest rate (arrests per reported crime) is likely to be related to the number of police, as well as to the crime rate, so it should be included in the regression.

Note that if a direct link between crime and police is assumed, there is a simultaneity bias that cannot be fixed by including ‘omitted’ variables. Such a situation requires IV-2SLS estimation

Note that if a direct link between crime and police is assumed, there is a simultaneity bias that cannot be fixed by including ‘omitted’ variables. Such a situation requires IV-2SLS estimation

Let Y be a random variable drawn form a probability desenity function with mean u and variance sigma squared. Given a random sample from this population, the mean is Ybar. A) Is Ybar a random variable? Why or why not?

B) What is the mean and variance of Ybar?

C) Write the appropriate statistic to test the null hypothesis that u = 1 against the alternative that u doesn’t equal 1, assuming that sigma squared is not known. If n is sufficiently large, what distribution should you use to compute the p-value of this test, and why?

D) Suppose the estimated p-value for the test on u above is .048. What do you infer about the null hypothesis.

B) What is the mean and variance of Ybar?

C) Write the appropriate statistic to test the null hypothesis that u = 1 against the alternative that u doesn’t equal 1, assuming that sigma squared is not known. If n is sufficiently large, what distribution should you use to compute the p-value of this test, and why?

D) Suppose the estimated p-value for the test on u above is .048. What do you infer about the null hypothesis.

C) If n is large, then by the Central Limit Theorem, the t-statistic is distributed as a standard normal random variable. This distribution can be used to compute the p-value (the probability that a t random variable exceeds the observed statistic in absolute value).

D) If the desired significant level (Probability of Type 1 error) is 5%, we can reject the null hypothesis that mu = 1.

D) If the desired significant level (Probability of Type 1 error) is 5%, we can reject the null hypothesis that mu = 1.

…