Gas Sales Data

Read in the data

As always, remember to set your working directory. Read in the data, and take a look at the data.

library(forecast)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
Gas <- read.csv("RetailSales2018_Clean.csv")
str(Gas)
## 'data.frame':    318 obs. of  5 variables:
##  $ Year    : int  1992 1992 1992 1992 1992 1992 1992 1992 1992 1992 ...
##  $ Month   : chr  "JAN" "FEB" "MAR" "APR" ...
##  $ Sales   : num  147182 147013 159653 163606 170089 ...
##  $ S_Factor: num  0.897 0.895 0.975 0.993 1.027 ...
##  $ CPI     : num  138 139 139 140 140 ...
head(Gas)
##   Year Month    Sales S_Factor   CPI
## 1 1992   JAN 147182.5    0.897 138.1
## 2 1992   FEB 147012.7    0.895 138.6
## 3 1992   MAR 159653.3    0.975 139.3
## 4 1992   APR 163605.7    0.993 139.5
## 5 1992   MAY 170088.7    1.027 139.7
## 6 1992   JUN 168921.7    1.017 140.2
tail(Gas)
##     Year Month    Sales S_Factor     CPI
## 313 2018   JAN 443814.7    0.902 247.867
## 314 2018   FEB 433426.4    0.880 248.991
## 315 2018   MAR 509967.2    1.028 249.554
## 316 2018   APR 485331.6    0.975 250.546
## 317 2018   MAY 531574.4    1.054 251.588
## 318 2018   JUN 514895.6    1.016 251.989
Gas_Sales_full <- ts(Gas$Sales, start = 1992, frequency = 12)

We are using the data from the year of 1992 to 1999 to build a model.

Data Visualization

Make a time series plot and seasonal plot.

Gas_Sales <- window(Gas_Sales_full,end = c(1999, 12))

autoplot(Gas_Sales, main = "Gas Sales Amount", xlab = "Year", ylab = "Sales Amount")

From the time series plot, there is an obviously increasing trend, so we should choose a model that can include a trend term.

ggseasonplot(Gas_Sales, main = "Seasonal Plot of Gas Sales for 1992-1999", xlab = "Month", ylab = "Sales Amount")

From the seasonal plot, we can find that there is a seasonal pattern. For each year, there is a peak of gas sales on December, and on the February, the sales amount is usually the lowest of the whole year. We should choose a model that can include a seasonal component.

Model Fitting

We choose to use a Holt-Winters Model, which can include both the trend term and the seasonal component. We fit both an additive Holt-Winters model (AAA) (Y = L + T + S + E) and a Holt-Winters model with multiplicative seasonal factor (MAM), Y = (L + T) * S * E

# Additive Model
fit_add <- ets(Gas_Sales,model="AAA",damped=FALSE)
# Multiplicative Model
fit_mult <- ets(Gas_Sales,model="MAM",damped=FALSE)

Plot of components from each model

We can make a plot of components from each model.

plot(fit_add)

plot(fit_mult)

Compare the performance of different models

To make a 1-step ahead prediction for Jan 2000, we can use forecast() function and specify the forecast horizon h = 1. The function will give us both the point estimation and the preidciton intervals.

frc1 <- forecast(fit_add,h = 1)
frc1
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2000       245049.5 239827.4 250271.5 237063.1 253035.9

If we want to make a prediction for the whole year of 2000, and compare the performance of the AAA model and MAM model. We can set h = 12, and it will give us the predictions with h = 1 all the way to h = 12.

frc_2000_add <- forecast(fit_add, h = 12)
frc_2000_add
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2000       245049.5 239827.4 250271.5 237063.1 253035.9
## Feb 2000       246457.3 241093.3 251821.3 238253.8 254660.9
## Mar 2000       273532.9 267974.0 279091.9 265031.2 282034.6
## Apr 2000       274647.9 268836.9 280458.8 265760.8 283535.0
## May 2000       286878.9 280757.1 293000.7 277516.4 296241.4
## Jun 2000       285772.8 279281.2 292264.3 275844.8 295700.7
## Jul 2000       285385.6 278467.1 292304.1 274804.7 295966.5
## Aug 2000       291207.6 283807.4 298607.8 279890.0 302525.2
## Sep 2000       282373.6 274440.2 290307.1 270240.5 294506.8
## Oct 2000       291450.7 282935.6 299965.8 278428.0 304473.4
## Nov 2000       294011.2 284869.4 303152.9 280030.1 307992.2
## Dec 2000       335479.9 325669.6 345290.3 320476.3 350483.6
frc_2000_mult <- forecast(fit_mult, h = 12)
frc_2000_mult
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2000       229782.8 224755.3 234810.3 222093.9 237471.7
## Feb 2000       228925.7 223915.1 233936.4 221262.6 236588.9
## Mar 2000       259484.8 253803.1 265166.4 250795.4 268174.1
## Apr 2000       260252.0 254551.4 265952.7 251533.6 268970.4
## May 2000       272442.1 266472.1 278412.0 263311.8 281572.3
## Jun 2000       269538.6 263630.1 275447.2 260502.3 278575.0
## Jul 2000       267970.7 262094.3 273847.1 258983.5 276957.9
## Aug 2000       273462.0 267462.9 279461.1 264287.2 282636.9
## Sep 2000       260097.6 254389.6 265805.7 251367.9 268827.4
## Oct 2000       269303.0 263390.7 275215.3 260260.9 278345.1
## Nov 2000       271579.2 265614.6 277543.7 262457.2 280701.2
## Dec 2000       323193.3 316092.5 330294.1 312333.6 334053.0

We can comparing the performance based on out-of-sample RMSE.

gas_sales_2000 <- window(Gas_Sales_full, start = c(2000, 1), end = c(2000, 12))

## Calculate the prediction errors
Prediction_Error_add <- gas_sales_2000 - frc_2000_add$mean
Prediction_Error_mult <- gas_sales_2000 - frc_2000_mult$mean

## Calculate the MSE
MSE_add <- mean(Prediction_Error_add^2 )
MSE_mult <- mean(Prediction_Error_mult^2 )


RMSE_add <- sqrt(MSE_add)
RMSE_mult <- sqrt(MSE_mult)

c(RMSE_add, RMSE_mult)
## [1] 11766.45 10862.81

The multiplicative model has a lower out-of-sample RMSE, so the multiplicative Holt-Winters Model is better.