STAT 3300 Homework #5 Due Friday, 05/22/2020
Note: Answer these questions on a separate piece of paper. In the top right corner, include your name, SMU ID, and course number. Please include a title for the assignment so that it is clear to the graders. If you miss class the day the assignment is turned in, submit this before class in order to receive credit.
Question 1 (12 points total, 4 points each) Test the null hypothesis that the slope is zero versus the two-sided alternative in each of the following settings using the ? = 0.05 significance level. To receive full credit, show the test statistic, report the p-value, and state your decision (reject or fail to reject).
a) n = 20, y? = 28.5 + 1.4x, and SEb1 = 0.65
b) n = 30, y? = 30.8 + 2.1x, and SEb1 = 1.05
c) n = 100, y? = 29.3 + 2.1x, and SEb1 = 1.05
Question 2 (12 points total, 4 points each) For each of the settings in the previous exercise, find the 95% confidence interval for the slope and explain what the interval means.
Question 3 (16 points total) The Storm Prediction Center of the National Oceanic and Atmospheric Administration maintains a database of tornadoes, floods, and other weather phenomena. The file EX10-19TWISTER.csv includes the annual number of tornadoes in the United States between 1953 and 2014. Use R to carry out the analysis and answer the following:
a) (5 points) Make a plot of the total number of tornadoes by year. Does a linear trend over years appear reasonable? Are there any outliers or unusual paterns? Explain your answer.
b) (5 points) Run the simple linear regression and report the least-squares regression line.
c) (3 points) A friend of yours thinks you made a mistake fitting the model because b0 is a large negative value. Explain to him why this is not a mistake.
d) (3 points) Obtain the residuals and plot them versus year. Are there any unusual patterns or cases that you did not discuss in part a? If so, comment on them.
1
Question 4 (16 points total) Refer to the previous exercise. Lets proceed with inference. Continue using R.
a) (5 points) Do these data support a linear trend in the number of tornadoes? Justify your answer.
b) (5 points) Construct a 95% confidence interval for the average annual increase in the number of tornadoes. Explain how this interval can be used to justify your response in part a.
c) (3 points) What is the predicted number of tornadoes in 2015?
d) (3 points) Provide an interval that should contain the actual count of tornadoes in 2015, 95% of the time.
Question 5 (18 points total) The file EX10-38TUIT.csv contains the undergraduate tuition in 2008 and 2014 for 33 public universities. Use R to carry out the analysis.
a) (5 points) Plot the data with the 2008 tuition on the x axis and describe the relationship. Are there any outliers or unusual values? Does a linear relationship between the tuition in 2008 and 2014 seem reasonable?
b) (5 points) Run the simple linear regression and give the least-squares regression line.
c) (3 point) Obtain the residuals and plot them versus the 2008 tuition statement. Is there anything unusual in the plot?
d) (5 points) The five California schools appear to follow the same linear trend as the other schools, but have higher-than-predicited in-state tuition in 2014. Assume that this jump is particular to this state and remove these five cases and refit the model. How do the parameter estimates change?
Question 6 (17 points total) Refer to the previous exercise. Well now move forward with inference using the model with the five California schools removed from the dataset.
a) (2 points) Give the null and alternative hypotheses for examining whether there is a linear relationship between 2008 and 2014 tuition amounts.
b) (5 points) Write down the test statistic and p-value for the hypotheses stated in part a. State your conclusions.
c) (5 points) Construct a 95% confidence interval for the slope. What does this interval tell you about the annual percent increase in tuition between 2008 and 2014?
d) (2 points) What percent of the variability in 2014 tuition is explained by a linear regression model using the 2008 tuition?
e) (3 points) Explain why inference on ?0 is not of interest for this problem.
Question 7 (9 points total, 3 points each) Refer to the previous two exercises, again making inference using the model with the five California schools removed from the dataset.
a) (3 points) Suppose the tuition at Skinflint U was $8,800 in 2008. What is the predicted tuition in 2014?
b) (3 points) Suppose the tuition at I.O.U. was $15,700 in 2008. What is the predicted tuition in 2014?
c) (3 points) Discuss the appropriateness of using the fitted equation to predict tuition for each of these universities.
2
Question 1 (12 points total, 4 points each)
Question 2 (12 points total, 4 points each)
Question 3 (16 points total)
Question 4 (16 points total)
Question 5 (18 points total)
Question 6 (17 points total)
Question 7 (9 points total, 3 points each)