Module 2 Basic Tools For Forecasting

Time series plots

miles <- read.csv("Revenue_miles_2.csv")
str(miles)
## 'data.frame':    192 obs. of  3 variables:
##  $ DATE            : chr  "2000-01-01" "2000-02-01" "2000-03-01" "2000-04-01" ...
##  $ Miles.Thousands.: int  49843099 49931931 61478163 58981617 61223861 65601574 67898320 67028338 56441629 58834210 ...
##  $ Miles..Billions.: num  4.98 4.99 6.15 5.9 6.12 ...
library(forecast)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
miles.ts <- ts(miles$Miles..Billions.,start = c(2000,1),end = c(2004,12),frequency = 12)
autoplot(miles.ts,xlab = "Years", ylab = "Miles (Billions)", main = "RPM vs Years")

seasonplot(miles.ts, col = c(1:5), 
           xlab = "Month", ylab = "Miles (Billions)", main = "RPM vs Month")

ggseasonplot(miles.ts, col = c(1:5),
           xlab = "Month", ylab = "Miles (Billions)", main = "RPM vs Month")

Summarize the Data

Suppose the sales of a popular book over a seven-week period are as follows:

sales.df <- data.frame(Week=1:7, Sales=c(15,10,12,16,9,8,14))
sales.df
##   Week Sales
## 1    1    15
## 2    2    10
## 3    3    12
## 4    4    16
## 5    5     9
## 6    6     8
## 7    7    14

Measures of Location

-Mean / Average

sales.df$Sales
## [1] 15 10 12 16  9  8 14
mean(sales.df$Sales)
## [1] 12

-Order Statistics

We can order the vector, and then select the order statistics we want. For example, if we want the 4th largest sales:

sort(sales.df$Sales,decreasing = F)
## [1]  8  9 10 12 14 15 16
sort(sales.df$Sales,decreasing = F)[4]
## [1] 12

-Median

median(sales.df$Sales)
## [1] 12

-Summary

Function summary() can give the minimum, the maximum, median, mean, and the 1st and 3rd quantile.

summary(sales.df$Sales)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     8.0     9.5    12.0    12.0    14.5    16.0

Measures of Variation

-Range

max(sales.df$Sales) - min(sales.df$Sales)
## [1] 8

-Measures of Dispersion

  • Mean absolute deviation
sales.df$Sales - mean(sales.df$Sales)
## [1]  3 -2  0  4 -3 -4  2
sales.abs <- abs(sales.df$Sales - mean(sales.df$Sales))
n <- length(sales.df$Sales)
MAD <- sum(sales.abs)/n
MAD
## [1] 2.571429
  • Variation
var(sales.df$Sales)
## [1] 9.666667
  • Standard Deviation
sd(sales.df$Sales)
## [1] 3.109126

-Assessing Variability

Z-score: standardized score, which measures the relative location of the observation in a dataset.

(sales.df$Sales - mean(sales.df$Sales))/sd(sales.df$Sales)
## [1]  0.9649013 -0.6432675  0.0000000  1.2865350 -0.9649013 -1.2865350  0.6432675
scale(sales.df$Sales,center = T,scale = T)
##            [,1]
## [1,]  0.9649013
## [2,] -0.6432675
## [3,]  0.0000000
## [4,]  1.2865350
## [5,] -0.9649013
## [6,] -1.2865350
## [7,]  0.6432675
## attr(,"scaled:center")
## [1] 12
## attr(,"scaled:scale")
## [1] 3.109126

Roughly, you can consider the Z-scores follows a standard normal distribution.

  • The probability of |Z| > 1 is about 0.32.
  • The probability of |Z| > 2 is about 0.046.
  • The probability of |Z| > 3 is about 0.0027.

You can also use function: pnorm() to calculate the probability that Z is less than some value. For example, the probability of Z-score is less than 1.6 is:

pnorm(1.6)
## [1] 0.9452007

The probability that the absolute value of Z-score is greater than 1.6 is:

2 * (1 - pnorm(1.6))
## [1] 0.1095986

Exercise:

What is the probablity that Z-score’s absolute value is greater 2.5?

Measure of Linear Relationships

Load in the German forecasts dataset.

forecast <- read.csv("German_forecasts.csv")
str(forecast)
## 'data.frame':    25 obs. of  9 variables:
##  $ Institutions: chr  "Bundesbank" "Commerzbank" "Deka" "Deutsche Bank" ...
##  $ GDP         : num  0.4 0.5 0.7 0.3 0.9 0.4 1.2 1 1.1 0.6 ...
##  $ Privcons    : num  1 1.3 1.1 0.6 1.1 0.9 1.2 1.1 1.2 1 ...
##  $ GFCF        : num  -0.1 0.1 -0.3 1.1 0.9 0.1 1.9 1.9 2.6 0.5 ...
##  $ Exports     : num  1.9 2.8 3.3 3.2 4.2 3 4.1 3.8 5.5 2.9 ...
##  $ Imports     : num  3 4.1 3.3 4.2 4.6 3.8 4.1 4.6 5 4.1 ...
##  $ Govsurp     : num  -0.75 -0.5 -0.3 -0.5 0 -0.7 -0.3 -0.2 0 -0.4 ...
##  $ Consprix    : num  1.5 1.9 1.9 1.7 2.8 2.1 2 2.1 2 2 ...
##  $ Unemp       : num  7.2 7.1 6.9 7 7 7.1 6.6 6.8 7 7 ...
head(forecast)
##    Institutions GDP Privcons GFCF Exports Imports Govsurp Consprix Unemp
## 1    Bundesbank 0.4      1.0 -0.1     1.9     3.0   -0.75      1.5   7.2
## 2   Commerzbank 0.5      1.3  0.1     2.8     4.1   -0.50      1.9   7.1
## 3          Deka 0.7      1.1 -0.3     3.3     3.3   -0.30      1.9   6.9
## 4 Deutsche Bank 0.3      0.6  1.1     3.2     4.2   -0.50      1.7   7.0
## 5           DIW 0.9      1.1  0.9     4.2     4.6    0.00      2.8   7.0
## 6       DZ Bank 0.4      0.9  0.1     3.0     3.8   -0.70      2.1   7.1

Example: German Forecasts

GDP - Gross Domestic Product
Privcons - Private Consumption
GFCF - Gross Fixed Capital Formation
Govsurp - Government Surplus
Consprix - Consumer Prices
Unemp - Unemployed Quota

Correlation

cor(forecast[,-1])
##                 GDP   Privcons       GFCF    Exports    Imports     Govsurp
## GDP       1.0000000 0.42973866  0.6012955  0.4593605  0.2092500  0.61963838
## Privcons  0.4297387 1.00000000  0.2946875  0.2618317  0.4693350  0.03339097
## GFCF      0.6012955 0.29468750  1.0000000  0.1924021  0.3709159  0.22259944
## Exports   0.4593605 0.26183166  0.1924021  1.0000000  0.6411323  0.51777441
## Imports   0.2092500 0.46933504  0.3709159  0.6411323  1.0000000  0.08463580
## Govsurp   0.6196384 0.03339097  0.2225994  0.5177744  0.0846358  1.00000000
## Consprix  0.3170663 0.19192464  0.3356997  0.2102851  0.3817395  0.17919491
## Unemp    -0.3633741 0.07337803 -0.2948008 -0.2729608 -0.0872142 -0.31655934
##            Consprix       Unemp
## GDP      0.31706635 -0.36337411
## Privcons 0.19192464  0.07337803
## GFCF     0.33569972 -0.29480075
## Exports  0.21028513 -0.27296080
## Imports  0.38173948 -0.08721420
## Govsurp  0.17919491 -0.31655934
## Consprix 1.00000000  0.06263273
## Unemp    0.06263273  1.00000000
round(cor(forecast[,-1]),digits = 3)
##             GDP Privcons   GFCF Exports Imports Govsurp Consprix  Unemp
## GDP       1.000    0.430  0.601   0.459   0.209   0.620    0.317 -0.363
## Privcons  0.430    1.000  0.295   0.262   0.469   0.033    0.192  0.073
## GFCF      0.601    0.295  1.000   0.192   0.371   0.223    0.336 -0.295
## Exports   0.459    0.262  0.192   1.000   0.641   0.518    0.210 -0.273
## Imports   0.209    0.469  0.371   0.641   1.000   0.085    0.382 -0.087
## Govsurp   0.620    0.033  0.223   0.518   0.085   1.000    0.179 -0.317
## Consprix  0.317    0.192  0.336   0.210   0.382   0.179    1.000  0.063
## Unemp    -0.363    0.073 -0.295  -0.273  -0.087  -0.317    0.063  1.000

Scatter Plot Matrices:

pairs(x = forecast[, c("GDP", "GFCF", "Govsurp", "Unemp")], pch = 19)

Exercise:

Check correlation between GDP and GFCF