Read in the data
As always, remember to set your working directory. Read in the data, and take a look at the data.
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
Gas <- read.csv("RetailSales2018_Clean.csv")
str(Gas)
## 'data.frame': 318 obs. of 5 variables:
## $ Year : int 1992 1992 1992 1992 1992 1992 1992 1992 1992 1992 ...
## $ Month : chr "JAN" "FEB" "MAR" "APR" ...
## $ Sales : num 147182 147013 159653 163606 170089 ...
## $ S_Factor: num 0.897 0.895 0.975 0.993 1.027 ...
## $ CPI : num 138 139 139 140 140 ...
head(Gas)
## Year Month Sales S_Factor CPI
## 1 1992 JAN 147182.5 0.897 138.1
## 2 1992 FEB 147012.7 0.895 138.6
## 3 1992 MAR 159653.3 0.975 139.3
## 4 1992 APR 163605.7 0.993 139.5
## 5 1992 MAY 170088.7 1.027 139.7
## 6 1992 JUN 168921.7 1.017 140.2
tail(Gas)
## Year Month Sales S_Factor CPI
## 313 2018 JAN 443814.7 0.902 247.867
## 314 2018 FEB 433426.4 0.880 248.991
## 315 2018 MAR 509967.2 1.028 249.554
## 316 2018 APR 485331.6 0.975 250.546
## 317 2018 MAY 531574.4 1.054 251.588
## 318 2018 JUN 514895.6 1.016 251.989
Gas_Sales_full <- ts(Gas$Sales, start = 1992, frequency = 12)
We are using the data from the year of 1992 to 1999 to build a model.
Data Visualization
Make a time series plot and seasonal plot.
Gas_Sales <- window(Gas_Sales_full,end = c(1999, 12))
autoplot(Gas_Sales, main = "Gas Sales Amount", xlab = "Year", ylab = "Sales Amount")
From the time series plot, there is an obviously increasing trend, so we should choose a model that can include a trend term.
ggseasonplot(Gas_Sales, main = "Seasonal Plot of Gas Sales for 1992-1999", xlab = "Month", ylab = "Sales Amount")
From the seasonal plot, we can find that there is a seasonal pattern. For each year, there is a peak of gas sales on December, and on the February, the sales amount is usually the lowest of the whole year. We should choose a model that can include a seasonal component.
Model Fitting
We choose to use a Holt-Winters Model, which can include both the trend term and the seasonal component. We fit both an additive Holt-Winters model (AAA) (Y = L + T + S + E) and a Holt-Winters model with multiplicative seasonal factor (MAM), Y = (L + T) * S * E
# Additive Model
fit_add <- ets(Gas_Sales,model="AAA",damped=FALSE)
# Multiplicative Model
fit_mult <- ets(Gas_Sales,model="MAM",damped=FALSE)
Plot of components from each model
We can make a plot of components from each model.
plot(fit_add)
plot(fit_mult)
Compare the performance of different models
To make a 1-step ahead prediction for Jan 2000, we can use forecast() function and specify the forecast horizon h = 1. The function will give us both the point estimation and the preidciton intervals.
frc1 <- forecast(fit_add,h = 1)
frc1
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2000 245049.5 239827.4 250271.5 237063.1 253035.9
If we want to make a prediction for the whole year of 2000, and compare the performance of the AAA model and MAM model. We can set h = 12, and it will give us the predictions with h = 1 all the way to h = 12.
frc_2000_add <- forecast(fit_add, h = 12)
frc_2000_add
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2000 245049.5 239827.4 250271.5 237063.1 253035.9
## Feb 2000 246457.3 241093.3 251821.3 238253.8 254660.9
## Mar 2000 273532.9 267974.0 279091.9 265031.2 282034.6
## Apr 2000 274647.9 268836.9 280458.8 265760.8 283535.0
## May 2000 286878.9 280757.1 293000.7 277516.4 296241.4
## Jun 2000 285772.8 279281.2 292264.3 275844.8 295700.7
## Jul 2000 285385.6 278467.1 292304.1 274804.7 295966.5
## Aug 2000 291207.6 283807.4 298607.8 279890.0 302525.2
## Sep 2000 282373.6 274440.2 290307.1 270240.5 294506.8
## Oct 2000 291450.7 282935.6 299965.8 278428.0 304473.4
## Nov 2000 294011.2 284869.4 303152.9 280030.1 307992.2
## Dec 2000 335479.9 325669.6 345290.3 320476.3 350483.6
frc_2000_mult <- forecast(fit_mult, h = 12)
frc_2000_mult
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2000 229782.8 224755.3 234810.3 222093.9 237471.7
## Feb 2000 228925.7 223915.1 233936.4 221262.6 236588.9
## Mar 2000 259484.8 253803.1 265166.4 250795.4 268174.1
## Apr 2000 260252.0 254551.4 265952.7 251533.6 268970.4
## May 2000 272442.1 266472.1 278412.0 263311.8 281572.3
## Jun 2000 269538.6 263630.1 275447.2 260502.3 278575.0
## Jul 2000 267970.7 262094.3 273847.1 258983.5 276957.9
## Aug 2000 273462.0 267462.9 279461.1 264287.2 282636.9
## Sep 2000 260097.6 254389.6 265805.7 251367.9 268827.4
## Oct 2000 269303.0 263390.7 275215.3 260260.9 278345.1
## Nov 2000 271579.2 265614.6 277543.7 262457.2 280701.2
## Dec 2000 323193.3 316092.5 330294.1 312333.6 334053.0
We can comparing the performance based on out-of-sample RMSE.
gas_sales_2000 <- window(Gas_Sales_full, start = c(2000, 1), end = c(2000, 12))
## Calculate the prediction errors
Prediction_Error_add <- gas_sales_2000 - frc_2000_add$mean
Prediction_Error_mult <- gas_sales_2000 - frc_2000_mult$mean
## Calculate the MSE
MSE_add <- mean(Prediction_Error_add^2 )
MSE_mult <- mean(Prediction_Error_mult^2 )
RMSE_add <- sqrt(MSE_add)
RMSE_mult <- sqrt(MSE_mult)
c(RMSE_add, RMSE_mult)
## [1] 11766.45 10862.81
The multiplicative model has a lower out-of-sample RMSE, so the multiplicative Holt-Winters Model is better.