library(forecast)
library(tseries)
WFJ_Sales <- read.csv("WFJ_Sales_Full.csv")
head(WFJ_Sales)
## Obs WFJ.Sales
## 1 1 23056.13
## 2 2 24817.32
## 3 3 24299.55
## 4 4 23242.47
## 5 5 22862.30
## 6 6 22862.68
Netflix <- read.csv("Netflix.csv")
head(Netflix)
## Year Quarter Total.Revenues
## 1 2000 1 5.174
## 2 2000 2 7.147
## 3 2000 3 10.182
## 4 2000 4 13.391
## 5 2001 1 17.057
## 6 2001 2 18.359
WFJ.ts <- ts(WFJ_Sales$WFJ.Sales,start = 1)
WFJ.ts
## Time Series:
## Start = 1
## End = 62
## Frequency = 1
## [1] 23056.13 24817.32 24299.55 23242.47 22862.30 22862.68 23391.02 22469.45
## [9] 22241.38 24366.54 29456.55 31293.51 38713.47 35749.31 39768.38 32418.75
## [17] 37502.86 31473.98 35625.09 33159.21 34305.66 33630.57 32899.71 34426.16
## [25] 33777.48 34848.68 30986.25 33321.09 34002.68 35417.16 33822.06 32723.20
## [33] 34925.05 33459.65 30998.87 31286.16 35030.34 34259.51 35001.00 36040.21
## [41] 36056.22 31397.49 32187.02 30321.83 34588.43 38878.64 37165.95 37111.35
## [49] 39021.16 40737.19 42358.35 51914.08 35403.96 30554.71 30420.91 30972.32
## [57] 32336.01 28194.00 29202.85 28155.16 28404.09 34128.21
Netflix.ts <- ts(Netflix$Total.Revenues,start = c(2000, 1), frequency = 4)
Netflix.ts
## Qtr1 Qtr2 Qtr3 Qtr4
## 2000 5.174 7.147 10.182 13.391
## 2001 17.057 18.359 18.878 21.618
## 2002 30.527 36.360 40.731 45.188
## 2003 55.669 63.187 72.202 81.190
## 2004 99.820 119.710 140.410 140.660
## 2005 152.450 164.030 172.740 193.000
## 2006 224.130 239.350 255.950 277.230
## 2007 305.320 303.690 293.970 302.360
## 2008 326.180 337.610 341.270 359.940
## 2009 394.100 408.590 423.120 444.490
## 2010 493.670 519.820 553.220 595.920
## 2011 718.560 788.610 821.840 875.580
## 2012 869.790 889.160 905.090 945.240
## 2013 1023.960 1069.370 1106.000 1175.230
## 2014 1270.090 1340.410 1409.430 1484.730
## 2015 1573.130 1644.690 1738.360 1823.330
Null Hypothesis: the data tested is not stationary.
Alternative Hypothesis: the data tested is stationary.
adf.test(WFJ.ts)
##
## Augmented Dickey-Fuller Test
##
## data: WFJ.ts
## Dickey-Fuller = -2.4767, Lag order = 3, p-value = 0.3819
## alternative hypothesis: stationary
We get the p-value = 0.38, which is greater than 0.05. We can not reject the null hypothesis, so the WFJ sales data is nonstationary.
We can use diff() function to conduct differencing transformation. We can use argument “differences = 1” to specify the order of difference equal to 1.
adf.test(diff(WFJ.ts,differences = 1))
## Warning in adf.test(diff(WFJ.ts, differences = 1)): p-value smaller than printed
## p-value
##
## Augmented Dickey-Fuller Test
##
## data: diff(WFJ.ts, differences = 1)
## Dickey-Fuller = -4.5788, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary
The p-value is 0.01, which is lower than 0.05. So we can reject the null hypothesis, and believe that the 1-order differenced WFJ sales data is stationary.
adf.test(Netflix.ts)
## Warning in adf.test(Netflix.ts): p-value greater than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: Netflix.ts
## Dickey-Fuller = 1.6159, Lag order = 3, p-value = 0.99
## alternative hypothesis: stationary
We get the p-value = 0.99, which is greater than 0.05. We can not reject the null hypothesis, so the Netflix Revenue data is nonstationary.
adf.test(diff(Netflix.ts))
##
## Augmented Dickey-Fuller Test
##
## data: diff(Netflix.ts)
## Dickey-Fuller = -2.8511, Lag order = 3, p-value = 0.2303
## alternative hypothesis: stationary
We get the p-value = 0.2303, which is greater than 0.05. The 1-order differenced Netflix Revenue data is still nonstationary.
adf.test(diff(Netflix.ts, differences = 2))
## Warning in adf.test(diff(Netflix.ts, differences = 2)): p-value smaller than
## printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: diff(Netflix.ts, differences = 2)
## Dickey-Fuller = -5.3063, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary
We get the p-value = 0.01, which is lower than 0.05. After 2-order differencing, Netflix data is stationary.
WFJ_arima <- auto.arima(WFJ.ts)
WFJ_arima
## Series: WFJ.ts
## ARIMA(1,1,0)
##
## Coefficients:
## ar1
## -0.2720
## s.e. 0.1248
##
## sigma^2 = 12937009: log likelihood = -585.55
## AIC=1175.09 AICc=1175.3 BIC=1179.31
For WFJ sales data, the best ARIMA mode is ARIMA(1,1,0). The autoregressive part is of order 1 and the integrated is of order 1. It means after 1-order differencing, the WFJ sales data follows an AR(1) model.
Netflix_arima <- auto.arima(Netflix.ts)
Netflix_arima
## Series: Netflix.ts
## ARIMA(0,2,2)
##
## Coefficients:
## ma1 ma2
## -0.4079 -0.2580
## s.e. 0.1258 0.1273
##
## sigma^2 = 363: log likelihood = -269.93
## AIC=545.86 AICc=546.28 BIC=552.24
For Netflix revenue data, the best ARIMA mode is ARIMA(0,2,2). The moving average part is of order 2 and the integrated is of order 2. It means after 2-order differencing, the Netflix revenue data follows an MA(2) model.
Suppose we have fitted both ARIMA(2,1,2) and ARIMA(0,2,2) for Netflix revenue data. We can use AIC() function to get the AIC of the fitted model, and BIC() to get the BIC of the fitted model.
model1 <- arima(Netflix.ts, order = c(2, 1, 2))
model2 <- arima(Netflix.ts, order = c(0, 2, 2))
AIC(model1)
## [1] 560.3769
AIC(model2)
## [1] 545.8628
BIC(model1)
## [1] 571.0925
BIC(model2)
## [1] 552.2442
Compared to Model 1, Model 2 has a lower AIC and BIC, so model 2 is better than model 1.