5.8 Exercises

  1. The data below (data set fancy) concern the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the wharf at a beach resort town in Queensland, Australia. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff.

    1987198819891990199119921993
    Jan1664.812499.814717.025921.104826.647615.0310243.24
    Feb2397.535198.245702.635814.586470.239849.6911266.88
    Mar2840.717225.149957.5812421.259638.7714558.4021826.84
    Apr3547.294806.035304.786369.778821.1711587.3317357.33
    May3752.965900.886492.437609.128722.379332.5615997.79
    Jun3714.744951.346630.807224.7510209.4813082.0918601.53
    Jul4349.616179.127349.628121.2211276.5516732.7826155.15
    Aug3566.344752.158176.627979.2512552.2219888.6128586.52
    Sep5021.825496.438573.178093.0611637.3923933.3830505.41
    Oct6423.485835.109690.508476.7013606.8925391.3530821.33
    Nov7600.6012600.0815151.8417914.6621822.1136024.8046634.38
    Dec19756.2128541.7234061.0130114.4145060.6980721.71104660.67
    1. Produce a time plot of the data and describe the patterns in the graph. Identify any unusual or unexpected fluctuations in the time series.
    2. Explain why it is necessary to take logarithms of these data before fitting a model.
    3. Use R to fit a regression model to the logarithms of these sales data with a linear trend, seasonal dummies and a “surfing festival” dummy variable.
    4. Plot the residuals against time and against the fitted values. Do these plots reveal any problems with the model?
    5. Do boxplots of the residuals for each month. Does this reveal any problems with the model?
    6. What do the values of the coefficients tell you about each variable?
    7. What does the Durbin-Watson statistic tell you about your model?
    8. Regardless of your answers to the above questions, use your regression model to predict the monthly sales for 1994, 1995, and 1996. Produce prediction intervals for each of your forecasts.
    9. Transform your predictions and intervals to obtain predictions and intervals for the raw data.
    10. How could you improve these predictions by modifying the model?
  2. The data below (data set texasgas) shows the demand for natural gas and the price of natural gas for 20 towns in Texas in 1969.
    CityAverage price PConsumption per customer C
    (cents per thousand cubic feet)(thousand cubic feet)
    Amarillo30134
    Borger31112
    Dalhart37136
    Shamrock42109
    Royalty43105
    Texarkana4587
    Corpus Christi5056
    Palestine5443
    Marshall5477
    Iowa Park5735
    Palo Pinto5865
    Millsap5856
    Memphis6058
    Granger7355
    Llano8849
    Brownsville8939
    Mercedes9236
    Karnes City9746
    Mathis10040
    La Pryor10242
    1. Do a scatterplot of consumption against price. The data are clearly not linear. Three possible nonlinear models for the data are given below

      \begin{align*} C_i &= \exp(a + bP_i+e_i) \\ C_i &= \left\{\begin{array}{ll} a_1 + b_1P_i + e_i & \mbox{when $P_i \le 60$} \\ a_2 + b_2P_i + e_i & \mbox{when $P_i > 60$;} \end{array}\right.\\ C_i &= a + b_{1}P + b_{2}P^{2}. \end{align*}

      The second model divides the data into two sections, depending on whether the price is above or below 60 cents per 1,000 cubic feet.

    2. Can you explain why the slope of the fitted line should change with $P$?
    3. Fit the three models and find the coefficients, and residual variance in each case.

      For the second model, the parameters $a_1$, $a_2$, $b_1$, $b_2$ can be estimated by simply fitting a regression with four regressors but no constant: (i) a dummy taking value 1 when $P\le60$ and 0 otherwise; (ii) $\text{P1} = P$ when $P\le60$ and 0 otherwise; (iii) a dummy taking value 0 when $P\le60$ and 1 otherwise; (iv) $\text{P2}=P$ when $P>60$ and 0 otherwise.

    4. For each model, find the value of R2 and AIC, and produce a residual plot. Comment on the adequacy of the three models.
    5. For prices 40, 60, 80, 100, and 120 cents per 1,000 cubic feet, compute the forecasted per capita demand using the best model of the three above.
    6. Compute 95% prediction intervals. Make a graph of these prediction intervals and discuss their interpretation.
    7. What is the correlation between $P$ and $P^{2}$? Does this suggest any general problem to be considered in dealing with polynomial regressions---especially of higher orders?