miles <- read.csv("Revenue_miles_2.csv")
str(miles)
## 'data.frame': 192 obs. of 3 variables:
## $ DATE : chr "2000-01-01" "2000-02-01" "2000-03-01" "2000-04-01" ...
## $ Miles.Thousands.: int 49843099 49931931 61478163 58981617 61223861 65601574 67898320 67028338 56441629 58834210 ...
## $ Miles..Billions.: num 4.98 4.99 6.15 5.9 6.12 ...
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
miles.ts <- ts(miles$Miles..Billions.,start = c(2000,1),end = c(2004,12),frequency = 12)
autoplot(miles.ts,xlab = "Years", ylab = "Miles (Billions)", main = "RPM vs Years")
seasonplot(miles.ts, col = c(1:5),
xlab = "Month", ylab = "Miles (Billions)", main = "RPM vs Month")
ggseasonplot(miles.ts, col = c(1:5),
xlab = "Month", ylab = "Miles (Billions)", main = "RPM vs Month")
Suppose the sales of a popular book over a seven-week period are as follows:
sales.df <- data.frame(Week=1:7, Sales=c(15,10,12,16,9,8,14))
sales.df
## Week Sales
## 1 1 15
## 2 2 10
## 3 3 12
## 4 4 16
## 5 5 9
## 6 6 8
## 7 7 14
sales.df$Sales
## [1] 15 10 12 16 9 8 14
mean(sales.df$Sales)
## [1] 12
We can order the vector, and then select the order statistics we want. For example, if we want the 4th largest sales:
sort(sales.df$Sales,decreasing = F)
## [1] 8 9 10 12 14 15 16
sort(sales.df$Sales,decreasing = F)[4]
## [1] 12
median(sales.df$Sales)
## [1] 12
Function summary() can give the minimum, the maximum, median, mean, and the 1st and 3rd quantile.
summary(sales.df$Sales)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.0 9.5 12.0 12.0 14.5 16.0
max(sales.df$Sales) - min(sales.df$Sales)
## [1] 8
sales.df$Sales - mean(sales.df$Sales)
## [1] 3 -2 0 4 -3 -4 2
sales.abs <- abs(sales.df$Sales - mean(sales.df$Sales))
n <- length(sales.df$Sales)
MAD <- sum(sales.abs)/n
MAD
## [1] 2.571429
var(sales.df$Sales)
## [1] 9.666667
sd(sales.df$Sales)
## [1] 3.109126
Z-score: standardized score, which measures the relative location of the observation in a dataset.
(sales.df$Sales - mean(sales.df$Sales))/sd(sales.df$Sales)
## [1] 0.9649013 -0.6432675 0.0000000 1.2865350 -0.9649013 -1.2865350 0.6432675
scale(sales.df$Sales,center = T,scale = T)
## [,1]
## [1,] 0.9649013
## [2,] -0.6432675
## [3,] 0.0000000
## [4,] 1.2865350
## [5,] -0.9649013
## [6,] -1.2865350
## [7,] 0.6432675
## attr(,"scaled:center")
## [1] 12
## attr(,"scaled:scale")
## [1] 3.109126
Roughly, you can consider the Z-scores follows a standard normal distribution.
You can also use function: pnorm() to calculate the probability that Z is less than some value. For example, the probability of Z-score is less than 1.6 is:
pnorm(1.6)
## [1] 0.9452007
The probability that the absolute value of Z-score is greater than 1.6 is:
2 * (1 - pnorm(1.6))
## [1] 0.1095986
Exercise:
What is the probablity that Z-score’s absolute value is greater 2.5?
Load in the German forecasts dataset.
forecast <- read.csv("German_forecasts.csv")
str(forecast)
## 'data.frame': 25 obs. of 9 variables:
## $ Institutions: chr "Bundesbank" "Commerzbank" "Deka" "Deutsche Bank" ...
## $ GDP : num 0.4 0.5 0.7 0.3 0.9 0.4 1.2 1 1.1 0.6 ...
## $ Privcons : num 1 1.3 1.1 0.6 1.1 0.9 1.2 1.1 1.2 1 ...
## $ GFCF : num -0.1 0.1 -0.3 1.1 0.9 0.1 1.9 1.9 2.6 0.5 ...
## $ Exports : num 1.9 2.8 3.3 3.2 4.2 3 4.1 3.8 5.5 2.9 ...
## $ Imports : num 3 4.1 3.3 4.2 4.6 3.8 4.1 4.6 5 4.1 ...
## $ Govsurp : num -0.75 -0.5 -0.3 -0.5 0 -0.7 -0.3 -0.2 0 -0.4 ...
## $ Consprix : num 1.5 1.9 1.9 1.7 2.8 2.1 2 2.1 2 2 ...
## $ Unemp : num 7.2 7.1 6.9 7 7 7.1 6.6 6.8 7 7 ...
head(forecast)
## Institutions GDP Privcons GFCF Exports Imports Govsurp Consprix Unemp
## 1 Bundesbank 0.4 1.0 -0.1 1.9 3.0 -0.75 1.5 7.2
## 2 Commerzbank 0.5 1.3 0.1 2.8 4.1 -0.50 1.9 7.1
## 3 Deka 0.7 1.1 -0.3 3.3 3.3 -0.30 1.9 6.9
## 4 Deutsche Bank 0.3 0.6 1.1 3.2 4.2 -0.50 1.7 7.0
## 5 DIW 0.9 1.1 0.9 4.2 4.6 0.00 2.8 7.0
## 6 DZ Bank 0.4 0.9 0.1 3.0 3.8 -0.70 2.1 7.1
GDP - Gross Domestic Product
Privcons - Private Consumption
GFCF - Gross Fixed Capital Formation
Govsurp - Government Surplus
Consprix - Consumer Prices
Unemp - Unemployed Quota
cor(forecast[,-1])
## GDP Privcons GFCF Exports Imports Govsurp
## GDP 1.0000000 0.42973866 0.6012955 0.4593605 0.2092500 0.61963838
## Privcons 0.4297387 1.00000000 0.2946875 0.2618317 0.4693350 0.03339097
## GFCF 0.6012955 0.29468750 1.0000000 0.1924021 0.3709159 0.22259944
## Exports 0.4593605 0.26183166 0.1924021 1.0000000 0.6411323 0.51777441
## Imports 0.2092500 0.46933504 0.3709159 0.6411323 1.0000000 0.08463580
## Govsurp 0.6196384 0.03339097 0.2225994 0.5177744 0.0846358 1.00000000
## Consprix 0.3170663 0.19192464 0.3356997 0.2102851 0.3817395 0.17919491
## Unemp -0.3633741 0.07337803 -0.2948008 -0.2729608 -0.0872142 -0.31655934
## Consprix Unemp
## GDP 0.31706635 -0.36337411
## Privcons 0.19192464 0.07337803
## GFCF 0.33569972 -0.29480075
## Exports 0.21028513 -0.27296080
## Imports 0.38173948 -0.08721420
## Govsurp 0.17919491 -0.31655934
## Consprix 1.00000000 0.06263273
## Unemp 0.06263273 1.00000000
round(cor(forecast[,-1]),digits = 3)
## GDP Privcons GFCF Exports Imports Govsurp Consprix Unemp
## GDP 1.000 0.430 0.601 0.459 0.209 0.620 0.317 -0.363
## Privcons 0.430 1.000 0.295 0.262 0.469 0.033 0.192 0.073
## GFCF 0.601 0.295 1.000 0.192 0.371 0.223 0.336 -0.295
## Exports 0.459 0.262 0.192 1.000 0.641 0.518 0.210 -0.273
## Imports 0.209 0.469 0.371 0.641 1.000 0.085 0.382 -0.087
## Govsurp 0.620 0.033 0.223 0.518 0.085 1.000 0.179 -0.317
## Consprix 0.317 0.192 0.336 0.210 0.382 0.179 1.000 0.063
## Unemp -0.363 0.073 -0.295 -0.273 -0.087 -0.317 0.063 1.000
Scatter Plot Matrices:
pairs(x = forecast[, c("GDP", "GFCF", "Govsurp", "Unemp")], pch = 19)
Exercise:
Check correlation between GDP and GFCF