WFJ Sales and Netflix data

library(forecast)
library(tseries)
WFJ_Sales <- read.csv("WFJ_Sales_Full.csv")
head(WFJ_Sales)

##   Obs WFJ.Sales
## 1   1  23056.13
## 2   2  24817.32
## 3   3  24299.55
## 4   4  23242.47
## 5   5  22862.30
## 6   6  22862.68

Netflix <- read.csv("Netflix.csv")
head(Netflix)

##   Year Quarter Total.Revenues
## 1 2000       1          5.174
## 2 2000       2          7.147
## 3 2000       3         10.182
## 4 2000       4         13.391
## 5 2001       1         17.057
## 6 2001       2         18.359

WFJ.ts <- ts(WFJ_Sales$WFJ.Sales,start = 1)
WFJ.ts

## Time Series:
## Start = 1 
## End = 62 
## Frequency = 1 
##  [1] 23056.13 24817.32 24299.55 23242.47 22862.30 22862.68 23391.02 22469.45
##  [9] 22241.38 24366.54 29456.55 31293.51 38713.47 35749.31 39768.38 32418.75
## [17] 37502.86 31473.98 35625.09 33159.21 34305.66 33630.57 32899.71 34426.16
## [25] 33777.48 34848.68 30986.25 33321.09 34002.68 35417.16 33822.06 32723.20
## [33] 34925.05 33459.65 30998.87 31286.16 35030.34 34259.51 35001.00 36040.21
## [41] 36056.22 31397.49 32187.02 30321.83 34588.43 38878.64 37165.95 37111.35
## [49] 39021.16 40737.19 42358.35 51914.08 35403.96 30554.71 30420.91 30972.32
## [57] 32336.01 28194.00 29202.85 28155.16 28404.09 34128.21

Netflix.ts <- ts(Netflix$Total.Revenues,start = c(2000, 1), frequency = 4)
Netflix.ts

##          Qtr1     Qtr2     Qtr3     Qtr4
## 2000    5.174    7.147   10.182   13.391
## 2001   17.057   18.359   18.878   21.618
## 2002   30.527   36.360   40.731   45.188
## 2003   55.669   63.187   72.202   81.190
## 2004   99.820  119.710  140.410  140.660
## 2005  152.450  164.030  172.740  193.000
## 2006  224.130  239.350  255.950  277.230
## 2007  305.320  303.690  293.970  302.360
## 2008  326.180  337.610  341.270  359.940
## 2009  394.100  408.590  423.120  444.490
## 2010  493.670  519.820  553.220  595.920
## 2011  718.560  788.610  821.840  875.580
## 2012  869.790  889.160  905.090  945.240
## 2013 1023.960 1069.370 1106.000 1175.230
## 2014 1270.090 1340.410 1409.430 1484.730
## 2015 1573.130 1644.690 1738.360 1823.330

Stationarity Test

Stationarity test for WFJ Sales data

Null Hypothesis: the data tested is not stationary.
Alternative Hypothesis: the data tested is stationary.

adf.test(WFJ.ts)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  WFJ.ts
## Dickey-Fuller = -2.4767, Lag order = 3, p-value = 0.3819
## alternative hypothesis: stationary

We get the p-value = 0.38, which is greater than 0.05. We can not reject the null hypothesis, so the WFJ sales data is nonstationary.

We can use diff() function to conduct differencing transformation. We can use argument “differences = 1” to specify the order of difference equal to 1.

adf.test(diff(WFJ.ts,differences = 1))

## Warning in adf.test(diff(WFJ.ts, differences = 1)): p-value smaller than printed
## p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(WFJ.ts, differences = 1)
## Dickey-Fuller = -4.5788, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary

The p-value is 0.01, which is lower than 0.05. So we can reject the null hypothesis, and believe that the 1-order differenced WFJ sales data is stationary.

Sationarity test for Netflix Revenue data

adf.test(Netflix.ts)

## Warning in adf.test(Netflix.ts): p-value greater than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  Netflix.ts
## Dickey-Fuller = 1.6159, Lag order = 3, p-value = 0.99
## alternative hypothesis: stationary

We get the p-value = 0.99, which is greater than 0.05. We can not reject the null hypothesis, so the Netflix Revenue data is nonstationary.

adf.test(diff(Netflix.ts))

## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(Netflix.ts)
## Dickey-Fuller = -2.8511, Lag order = 3, p-value = 0.2303
## alternative hypothesis: stationary

We get the p-value = 0.2303, which is greater than 0.05. The 1-order differenced Netflix Revenue data is still nonstationary.

adf.test(diff(Netflix.ts, differences = 2))

## Warning in adf.test(diff(Netflix.ts, differences = 2)): p-value smaller than
## printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(Netflix.ts, differences = 2)
## Dickey-Fuller = -5.3063, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary

We get the p-value = 0.01, which is lower than 0.05. After 2-order differencing, Netflix data is stationary.

Fit ARIMA model for Netflix data

Use auto.arima() function to fit ARIMA model.

WFJ_arima <- auto.arima(WFJ.ts)
WFJ_arima

## Series: WFJ.ts 
## ARIMA(1,1,0) 
## 
## Coefficients:
##           ar1
##       -0.2720
## s.e.   0.1248
## 
## sigma^2 = 12937009:  log likelihood = -585.55
## AIC=1175.09   AICc=1175.3   BIC=1179.31

For WFJ sales data, the best ARIMA mode is ARIMA(1,1,0). The autoregressive part is of order 1 and the integrated is of order 1. It means after 1-order differencing, the WFJ sales data follows an AR(1) model.

Netflix_arima <- auto.arima(Netflix.ts)
Netflix_arima

## Series: Netflix.ts 
## ARIMA(0,2,2) 
## 
## Coefficients:
##           ma1      ma2
##       -0.4079  -0.2580
## s.e.   0.1258   0.1273
## 
## sigma^2 = 363:  log likelihood = -269.93
## AIC=545.86   AICc=546.28   BIC=552.24

For Netflix revenue data, the best ARIMA mode is ARIMA(0,2,2). The moving average part is of order 2 and the integrated is of order 2. It means after 2-order differencing, the Netflix revenue data follows an MA(2) model.

Compare models based on AIC/BIC

Suppose we have fitted both ARIMA(2,1,2) and ARIMA(0,2,2) for Netflix revenue data. We can use AIC() function to get the AIC of the fitted model, and BIC() to get the BIC of the fitted model.

model1 <- arima(Netflix.ts, order = c(2, 1, 2))
model2 <- arima(Netflix.ts, order = c(0, 2, 2))

AIC(model1)

## [1] 560.3769

AIC(model2)

## [1] 545.8628

BIC(model1)

## [1] 571.0925

BIC(model2)

## [1] 552.2442

Compared to Model 1, Model 2 has a lower AIC and BIC, so model 2 is better than model 1.

Lab Week 10