# 5.8 Exercises

1. The data below (data set fancy) concern the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the wharf at a beach resort town in Queensland, Australia. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff.

 1987 1988 1989 1990 1991 1992 1993 Jan 1664.81 2499.81 4717.02 5921.10 4826.64 7615.03 10243.24 Feb 2397.53 5198.24 5702.63 5814.58 6470.23 9849.69 11266.88 Mar 2840.71 7225.14 9957.58 12421.25 9638.77 14558.40 21826.84 Apr 3547.29 4806.03 5304.78 6369.77 8821.17 11587.33 17357.33 May 3752.96 5900.88 6492.43 7609.12 8722.37 9332.56 15997.79 Jun 3714.74 4951.34 6630.80 7224.75 10209.48 13082.09 18601.53 Jul 4349.61 6179.12 7349.62 8121.22 11276.55 16732.78 26155.15 Aug 3566.34 4752.15 8176.62 7979.25 12552.22 19888.61 28586.52 Sep 5021.82 5496.43 8573.17 8093.06 11637.39 23933.38 30505.41 Oct 6423.48 5835.10 9690.50 8476.70 13606.89 25391.35 30821.33 Nov 7600.60 12600.08 15151.84 17914.66 21822.11 36024.80 46634.38 Dec 19756.21 28541.72 34061.01 30114.41 45060.69 80721.71 104660.67
1. Produce a time plot of the data and describe the patterns in the graph. Identify any unusual or unexpected fluctuations in the time series.
2. Explain why it is necessary to take logarithms of these data before fitting a model.
3. Use R to fit a regression model to the logarithms of these sales data with a linear trend, seasonal dummies and a “surfing festival” dummy variable.
4. Plot the residuals against time and against the fitted values. Do these plots reveal any problems with the model?
5. Do boxplots of the residuals for each month. Does this reveal any problems with the model?
6. What do the values of the coefficients tell you about each variable?
8. Regardless of your answers to the above questions, use your regression model to predict the monthly sales for 1994, 1995, and 1996. Produce prediction intervals for each of your forecasts.
9. Transform your predictions and intervals to obtain predictions and intervals for the raw data.
10. How could you improve these predictions by modifying the model?
2. The data below (data set texasgas) shows the demand for natural gas and the price of natural gas for 20 towns in Texas in 1969.
 City Average price P Consumption per customer C (cents per thousand cubic feet) (thousand cubic feet) Amarillo 30 134 Borger 31 112 Dalhart 37 136 Shamrock 42 109 Royalty 43 105 Texarkana 45 87 Corpus Christi 50 56 Palestine 54 43 Marshall 54 77 Iowa Park 57 35 Palo Pinto 58 65 Millsap 58 56 Memphis 60 58 Granger 73 55 Llano 88 49 Brownsville 89 39 Mercedes 92 36 Karnes City 97 46 Mathis 100 40 La Pryor 102 42
1. Do a scatterplot of consumption against price. The data are clearly not linear. Three possible nonlinear models for the data are given below

\begin{align*} C_i &= \exp(a + bP_i+e_i) \\ C_i &= \left\{\begin{array}{ll} a_1 + b_1P_i + e_i & \mbox{when $P_i \le 60$} \\ a_2 + b_2P_i + e_i & \mbox{when $P_i > 60$;} \end{array}\right.\\ C_i &= a + b_{1}P + b_{2}P^{2}. \end{align*}

The second model divides the data into two sections, depending on whether the price is above or below 60 cents per 1,000 cubic feet.

2. Can you explain why the slope of the fitted line should change with $P$?
3. Fit the three models and find the coefficients, and residual variance in each case.

For the second model, the parameters $a_1$, $a_2$, $b_1$, $b_2$ can be estimated by simply fitting a regression with four regressors but no constant: (i) a dummy taking value 1 when $P\le60$ and 0 otherwise; (ii) $\text{P1} = P$ when $P\le60$ and 0 otherwise; (iii) a dummy taking value 0 when $P\le60$ and 1 otherwise; (iv) $\text{P2}=P$ when $P>60$ and 0 otherwise.

4. For each model, find the value of R2 and AIC, and produce a residual plot. Comment on the adequacy of the three models.
5. For prices 40, 60, 80, 100, and 120 cents per 1,000 cubic feet, compute the forecasted per capita demand using the best model of the three above.
6. Compute 95% prediction intervals. Make a graph of these prediction intervals and discuss their interpretation.
7. What is the correlation between $P$ and $P^{2}$? Does this suggest any general problem to be considered in dealing with polynomial regressions---especially of higher orders?