Download at https://cran.r-project.org/
Windows user: click base then download the latest version for Windows. http://cran.r-project.org/bin/windows/base/
Mac user (Intel): https://cran.r-project.org/bin/macosx/base/R-4.2.0.pkg
Mac user (M1): https://cran.r-project.org/bin/macosx/big-sur-arm64/base/R-4.2.0-arm64.pkg
RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.
It’s open-source software. (choose your OS version to download: https://www.rstudio.com/products/rstudio/download/)
When you open Rstudio, it looks like this:
If you would like to change background color to dark theme, you may configure by clicking (Tools > Global Options…> Appearance > Editor theme: Monaco)
As you can see from the interface above, there are three panels showing. The left hand panel is called console, it is where you run any command and see the output. The upper right panel is where RStudio shows data you have loaded, variables that have been defined, and other objects. The lower right panel is where the Files/Plots/Packages…etc., can be found.
Generally, you may want to code in editor rather than in console directly. When you click the green plus icon in the top-left corner, and select R script, you will find left panel split into two parts and a new script editor window shows up.
Now it’s time to get your hands dirty following the instructions and practice on the fundamental R commands.
Always set the working directory before you begin coding.
Working directory is the folder where you may load data, save output, and save the code.
#You can use getwd() to check your current working directory.
getwd()
## [1] "/Users/woshiamie/Desktop/UC/Teaching/BANA4090_23Summer/Module 1"
You can set your working directory by using setwd(“path”) command or manually clicking Session > Set Working Directory > Choose Directory…, or use Ctrl + Shift + H and choose the folder that you want to set as working directory.
Type the following code in the editor and run it line by line. To run a line of code, you can move the cursor to that line and use Ctrl + Enter (Cmd+Enter for Mac). If you want to run multiple lines of code, simply highlight those lines and use the same command.
“<-” and “=” are both assignment operator which assigns the right-hand side value to the left-hand side object.
x <- 88
y = 0.8
x * y + x / y
## [1] 180.4
Try these additional functions:
log(x); exp(x/y); sin(x); cos(y); sqrt(y)
\(E=mc^2\), equation in German-born physicist Albert Einstein’s theory of Special relativity that expresses the fact that mass and energy are the same physical entity and can be changed into each other. How much energy can 1 kg matter (any matter) be turned into?
*\(m = 1 kg\): mass of matter
\(c = 3 * 108 ms^{-1}\): speed of light
\(E = ?\)
Logical operaitons return TRUE (T) or FALSE (F)
x == y
## [1] FALSE
x > y
## [1] TRUE
2 >= 2
## [1] TRUE
2 > 2
## [1] FALSE
And: &
Or: |
All: all()
Any: any()
TRUE & TRUE
## [1] TRUE
TRUE & FALSE
## [1] FALSE
FALSE & FALSE
## [1] FALSE
TRUE | TRUE
## [1] TRUE
TRUE | FALSE
## [1] TRUE
FALSE | FALSE
## [1] FALSE
all(c(T,T,T,T,F))
## [1] FALSE
all(c(T,T,T,T,T))
## [1] TRUE
any(c(T,T,T,T,F))
## [1] TRUE
any(c(F,F,F,F,F))
## [1] FALSE
You can you install install.packages(“Package_name”) to install packages. After installation, you can use library() function to attach the package you want to use.
install.packages("readxl")
There are four types of data structures in R
A vector is the simplest type of data structure in R. Simply put, a vector is a sequence of data elements of the same basic type.
To assign a list of numbers (vector) to a variable, the numbers within the c command (combine) are separated by commas. For example, we can create a new variable called “p” which will contain the numbers 0.1, 0.3, 0.5 and 0.7:
# Define numerical vector p
p <- c(0.1, 0.3, 0.5, 0.7)
# Define character vector pp
pp <- c("IBM", "Microsoft", "Apple", "AMD")
Note that you can put a # at the beginning of a line to write comments in your code.
# Average
mean(p)
## [1] 0.4
# Standard deviation
sd(p)
## [1] 0.2581989
# Median
median(p)
## [1] 0.4
# Maximum
max(p)
## [1] 0.7
# Minimum
min(p)
## [1] 0.1
# Summary statistics
summary(p)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.10 0.25 0.40 0.40 0.55 0.70
# Define vector p1
p1 <- c(1,3,5,7)
Elementwise operations (vectors must be the same length).
p + p1
## [1] 1.1 3.3 5.5 7.7
p * p1
## [1] 0.1 0.9 2.5 4.9
p + 2
## [1] 2.1 2.3 2.5 2.7
p / 10
## [1] 0.01 0.03 0.05 0.07
A vector of multiple vectors is still a vector.
p2 <- c(p, p1)
p2
## [1] 0.1 0.3 0.5 0.7 1.0 3.0 5.0 7.0
To extract the second entry from vector p2:
p2[2]
## [1] 0.3
To extract all elements greater than 5 from vector p2:
p2[p2 > 5]
## [1] 7
To extract all elements greater than 5 and less than 7 from vector p2:
p2[p2 > 5 & p2 < 7]
## numeric(0)
# there's no such an element
To order the vector p2 from smallest to largest:
p2[order(p2)]
## [1] 0.1 0.3 0.5 0.7 1.0 3.0 5.0 7.0
sort(p2)
## [1] 0.1 0.3 0.5 0.7 1.0 3.0 5.0 7.0
length(p)
## [1] 4
# All but the first element in p
p[-1]
## [1] 0.3 0.5 0.7
Matrix is a collection of elements of the same data type (numeric, character, or logical) arranged into a fixed number of rows and columns.
Create a matrix using the matrix() function. This function creates a matrix from a given vector.
#Define a matrix A
X <- matrix(data = p2, nrow = 2, ncol = 4)
X
## [,1] [,2] [,3] [,4]
## [1,] 0.1 0.5 1 5
## [2,] 0.3 0.7 3 7
class(X)
## [1] "matrix" "array"
In R functions, you may ignore the argument names and simply list the inputs in the correct order.
X <- matrix(p2, 2, 4)
The default order to position the values of a vector in a matrix is by column, but you can specify it as by row using an additional argument byrow=T.
X <- matrix(data = p2, nrow = 2, ncol = 4, byrow = T )
Create a matrix using cbind() and rbind() functions
cbind() is a function that combines vectors or matrices to a matrix by column.
X <- cbind(p, p1)
X
## p p1
## [1,] 0.1 1
## [2,] 0.3 3
## [3,] 0.5 5
## [4,] 0.7 7
X <- rbind(p, p1)
Dimensions
dim(X)
## [1] 2 4
Elementwise operations for matrices
X + 1
## [,1] [,2] [,3] [,4]
## p 1.1 1.3 1.5 1.7
## p1 2.0 4.0 6.0 8.0
X * 2
## [,1] [,2] [,3] [,4]
## p 0.2 0.6 1 1.4
## p1 2.0 6.0 10 14.0
Subsetting, indexing
# The element at the intersection of the second row and the second column
X[2,2]
## p1
## 3
# The first row of matrix X
X[1, ]
## [1] 0.1 0.3 0.5 0.7
X[, 2:3]
## [,1] [,2]
## p 0.3 0.5
## p1 3.0 5.0
X[1, c(1,3)]
## [1] 0.1 0.5
Transpose and Multiplication
# Transpose
t(X)
## p p1
## [1,] 0.1 1
## [2,] 0.3 3
## [3,] 0.5 5
## [4,] 0.7 7
# Matrix multiplication
t(X) %*% X
## [,1] [,2] [,3] [,4]
## [1,] 1.01 3.03 5.05 7.07
## [2,] 3.03 9.09 15.15 21.21
## [3,] 5.05 15.15 25.25 35.35
## [4,] 7.07 21.21 35.35 49.49
A data frame is a table where each row represents an observation and each column represents a variable. A data frame has column names (variable names) and row names.
The function data.frame(X) converts the matrix X into a data frame.
df <- data.frame(X)
class(df)
## [1] "data.frame"
df
## X1 X2 X3 X4
## p 0.1 0.3 0.5 0.7
## p1 1.0 3.0 5.0 7.0
Use read.table() or read.csv() functions to import comma/space/tab delimited text files. You can also use the Import Dataset Wizard in RStudio (File > Import Dataset…). The package “readxl” allows you to read xls/xlsx files as well.
First, download the Dulles_2.csv file from Canvas and save it in your working directory.
mydata_csv <- read.csv("Dulles_2.csv", header = T)
A list is a container. You can put different types of objects into a list.
mylist <- list(myvector = p, mymatrix = X, mydata = mydata_csv)
mylist
## $myvector
## [1] 0.1 0.3 0.5 0.7
##
## $mymatrix
## [,1] [,2] [,3] [,4]
## p 0.1 0.3 0.5 0.7
## p1 1.0 3.0 5.0 7.0
##
## $mydata
## Year Passengers..000s.
## 1 1963 641
## 2 1964 728
## 3 1965 920
## 4 1966 1079
## 5 1967 1427
## 6 1968 1602
## 7 1969 1928
## 8 1970 1869
## 9 1971 1881
## 10 1972 1992
## 11 1973 2083
## 12 1974 2004
## 13 1975 2000
## 14 1976 2251
## 15 1977 2267
## 16 1978 2518
## 17 1979 2858
## 18 1980 2086
## 19 1981 1889
## 20 1982 2248
## 21 1983 2651
## 22 1984 3136
## 23 1985 4538
## 24 1986 8394
## 25 1987 9980
## 26 1988 8650
## 27 1989 9224
## 28 1990 9043
## 29 1991 9406
## 30 1992 9408
## 31 1993 8501
## 32 1994 8947
## 33 1995 9653
## 34 1996 10095
## 35 1997 10697
## 36 1998 12445
## 37 1999 16055
## 38 2000 15873
## 39 2001 14021
## 40 2002 13146
## 41 2003 12928
## 42 2004 18213
## 43 2005 22129
## 44 2006 17787
## 45 2007 18792
## 46 2008 17638
## 47 2009 16964
## 48 2010 17214
## 49 2011 16663
## 50 2012 15883
## 51 2013 14958
## 52 2014 14393
## 53 2015 14463
#Store the data in a conveniently named variable
n.Passenger <- mydata_csv$Passengers..000s.
#Create line chart
plot(n.Passenger, type = "l")
#Change to Time-Series Object
#install.packages("zoo")
library(zoo)
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
Dulles <- ts(mydata_csv$Passengers..000s.,start = 1963, freq = 1)
head(Dulles)
## [1] 641 728 920 1079 1427 1602
#install.packages("forecast")
require("forecast")
## Loading required package: forecast
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
plot(Dulles)