Download at https://cran.r-project.org/
Windows user: click base then download the latest version for Windows. http://cran.r-project.org/bin/windows/base/
Mac user (Intel): https://cran.r-project.org/bin/macosx/base/R-4.2.0.pkg
Mac user (M1): https://cran.r-project.org/bin/macosx/big-sur-arm64/base/R-4.2.0-arm64.pkg
RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.
It’s open-source software. (choose your OS version to download: https://www.rstudio.com/products/rstudio/download/)
When you open Rstudio, it looks like this:
Rstudio
If you would like to change background color to dark theme, you may configure by clicking (Tools > Global Options…> Appearance > Editor theme: Monaco)
Settings
Rstudio in Dark Mode
As you can see from the interface above, there are three panels showing. The left hand panel is called console, it is where you run any command and see the output. The upper right panel is where RStudio shows data you have loaded, variables that have been defined, and other objects. The lower right panel is where the Files/Plots/Packages…etc., can be found.
Generally, you may want to code in editor rather than in console directly. When you click the green plus icon in the top-left corner, and select R script, you will find left panel split into two parts and a new script editor window shows up.
Now it’s time to get your hands dirty following the instructions and practice on the fundamental R commands.
Always set the working directory before you begin coding.
Working directory is the folder where you may load data, save output, and save the code.
#You can use getwd() to check your current working directory.  
getwd()
## [1] "/Users/woshiamie/Desktop/UC/Teaching/BANA4090_23Summer/Module 1"
You can set your working directory by using setwd(“path”) command or manually clicking Session > Set Working Directory > Choose Directory…, or use Ctrl + Shift + H and choose the folder that you want to set as working directory.
Type the following code in the editor and run it line by line. To run a line of code, you can move the cursor to that line and use Ctrl + Enter (Cmd+Enter for Mac). If you want to run multiple lines of code, simply highlight those lines and use the same command.
“<-” and “=” are both assignment operator which assigns the right-hand side value to the left-hand side object.
x <- 88
y = 0.8
x * y + x / y
## [1] 180.4
Try these additional functions:
log(x); exp(x/y); sin(x); cos(y); sqrt(y)
\(E=mc^2\), equation in German-born physicist Albert Einstein’s theory of Special relativity that expresses the fact that mass and energy are the same physical entity and can be changed into each other. How much energy can 1 kg matter (any matter) be turned into?
*\(m = 1 kg\): mass of matter
\(c = 3 * 108 ms^{-1}\): speed of light
\(E = ?\)
Logical operaitons return TRUE (T) or FALSE (F)
x == y
## [1] FALSE
x > y
## [1] TRUE
2 >= 2
## [1] TRUE
2 > 2
## [1] FALSE
And: &
Or: |
All: all()
Any: any()
TRUE & TRUE
## [1] TRUE
TRUE & FALSE
## [1] FALSE
FALSE & FALSE
## [1] FALSE
TRUE | TRUE
## [1] TRUE
TRUE | FALSE
## [1] TRUE
FALSE | FALSE
## [1] FALSE
all(c(T,T,T,T,F))
## [1] FALSE
all(c(T,T,T,T,T))
## [1] TRUE
any(c(T,T,T,T,F))
## [1] TRUE
any(c(F,F,F,F,F))
## [1] FALSE
You can you install install.packages(“Package_name”) to install packages. After installation, you can use library() function to attach the package you want to use.
install.packages("readxl")
There are four types of data structures in R
A vector is the simplest type of data structure in R. Simply put, a vector is a sequence of data elements of the same basic type.
To assign a list of numbers (vector) to a variable, the numbers within the c command (combine) are separated by commas. For example, we can create a new variable called “p” which will contain the numbers 0.1, 0.3, 0.5 and 0.7:
# Define numerical vector p
p <- c(0.1, 0.3, 0.5, 0.7)
# Define character vector pp
pp <- c("IBM", "Microsoft", "Apple", "AMD")
Note that you can put a # at the beginning of a line to write comments in your code.
# Average
mean(p)
## [1] 0.4
# Standard deviation
sd(p)
## [1] 0.2581989
# Median
median(p)
## [1] 0.4
# Maximum
max(p)
## [1] 0.7
# Minimum
min(p)
## [1] 0.1
# Summary statistics
summary(p)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.10    0.25    0.40    0.40    0.55    0.70
# Define vector p1
p1 <- c(1,3,5,7)
Elementwise operations (vectors must be the same length).
p + p1
## [1] 1.1 3.3 5.5 7.7
p * p1
## [1] 0.1 0.9 2.5 4.9
p + 2
## [1] 2.1 2.3 2.5 2.7
p / 10
## [1] 0.01 0.03 0.05 0.07
A vector of multiple vectors is still a vector.
p2 <- c(p, p1)
p2
## [1] 0.1 0.3 0.5 0.7 1.0 3.0 5.0 7.0
To extract the second entry from vector p2:
p2[2]
## [1] 0.3
To extract all elements greater than 5 from vector p2:
p2[p2 > 5]
## [1] 7
To extract all elements greater than 5 and less than 7 from vector p2:
p2[p2 > 5 & p2 < 7]
## numeric(0)
# there's no such an element
To order the vector p2 from smallest to largest:
p2[order(p2)]
## [1] 0.1 0.3 0.5 0.7 1.0 3.0 5.0 7.0
sort(p2)
## [1] 0.1 0.3 0.5 0.7 1.0 3.0 5.0 7.0
length(p)
## [1] 4
# All but the first element in p
p[-1]
## [1] 0.3 0.5 0.7
Matrix is a collection of elements of the same data type (numeric, character, or logical) arranged into a fixed number of rows and columns.
Create a matrix using the matrix() function. This function creates a matrix from a given vector.
#Define a matrix A
X <- matrix(data = p2, nrow = 2, ncol = 4)
X
##      [,1] [,2] [,3] [,4]
## [1,]  0.1  0.5    1    5
## [2,]  0.3  0.7    3    7
class(X)
## [1] "matrix" "array"
In R functions, you may ignore the argument names and simply list the inputs in the correct order.
X <- matrix(p2, 2, 4)
The default order to position the values of a vector in a matrix is by column, but you can specify it as by row using an additional argument byrow=T.
X <- matrix(data = p2, nrow = 2, ncol = 4, byrow = T )
Create a matrix using cbind() and rbind() functions
cbind() is a function that combines vectors or matrices to a matrix by column.
X <- cbind(p, p1)
X
##        p p1
## [1,] 0.1  1
## [2,] 0.3  3
## [3,] 0.5  5
## [4,] 0.7  7
X <- rbind(p, p1)
Dimensions
dim(X)
## [1] 2 4
Elementwise operations for matrices
X + 1
##    [,1] [,2] [,3] [,4]
## p   1.1  1.3  1.5  1.7
## p1  2.0  4.0  6.0  8.0
X * 2
##    [,1] [,2] [,3] [,4]
## p   0.2  0.6    1  1.4
## p1  2.0  6.0   10 14.0
Subsetting, indexing
# The element at the intersection of the second row and the second column
X[2,2]
## p1 
##  3
# The first row of matrix X
X[1, ]
## [1] 0.1 0.3 0.5 0.7
X[, 2:3]
##    [,1] [,2]
## p   0.3  0.5
## p1  3.0  5.0
X[1, c(1,3)]
## [1] 0.1 0.5
Transpose and Multiplication
# Transpose
t(X)
##        p p1
## [1,] 0.1  1
## [2,] 0.3  3
## [3,] 0.5  5
## [4,] 0.7  7
# Matrix multiplication
t(X) %*% X
##      [,1]  [,2]  [,3]  [,4]
## [1,] 1.01  3.03  5.05  7.07
## [2,] 3.03  9.09 15.15 21.21
## [3,] 5.05 15.15 25.25 35.35
## [4,] 7.07 21.21 35.35 49.49
A data frame is a table where each row represents an observation and each column represents a variable. A data frame has column names (variable names) and row names.
The function data.frame(X) converts the matrix X into a data frame.
df <- data.frame(X)
class(df)
## [1] "data.frame"
df
##     X1  X2  X3  X4
## p  0.1 0.3 0.5 0.7
## p1 1.0 3.0 5.0 7.0
Use read.table() or read.csv() functions to import comma/space/tab delimited text files. You can also use the Import Dataset Wizard in RStudio (File > Import Dataset…). The package “readxl” allows you to read xls/xlsx files as well.
First, download the Dulles_2.csv file from Canvas and save it in your working directory.
mydata_csv <- read.csv("Dulles_2.csv", header = T)
A list is a container. You can put different types of objects into a list.
mylist <- list(myvector = p, mymatrix = X, mydata = mydata_csv)
mylist
## $myvector
## [1] 0.1 0.3 0.5 0.7
## 
## $mymatrix
##    [,1] [,2] [,3] [,4]
## p   0.1  0.3  0.5  0.7
## p1  1.0  3.0  5.0  7.0
## 
## $mydata
##    Year Passengers..000s.
## 1  1963               641
## 2  1964               728
## 3  1965               920
## 4  1966              1079
## 5  1967              1427
## 6  1968              1602
## 7  1969              1928
## 8  1970              1869
## 9  1971              1881
## 10 1972              1992
## 11 1973              2083
## 12 1974              2004
## 13 1975              2000
## 14 1976              2251
## 15 1977              2267
## 16 1978              2518
## 17 1979              2858
## 18 1980              2086
## 19 1981              1889
## 20 1982              2248
## 21 1983              2651
## 22 1984              3136
## 23 1985              4538
## 24 1986              8394
## 25 1987              9980
## 26 1988              8650
## 27 1989              9224
## 28 1990              9043
## 29 1991              9406
## 30 1992              9408
## 31 1993              8501
## 32 1994              8947
## 33 1995              9653
## 34 1996             10095
## 35 1997             10697
## 36 1998             12445
## 37 1999             16055
## 38 2000             15873
## 39 2001             14021
## 40 2002             13146
## 41 2003             12928
## 42 2004             18213
## 43 2005             22129
## 44 2006             17787
## 45 2007             18792
## 46 2008             17638
## 47 2009             16964
## 48 2010             17214
## 49 2011             16663
## 50 2012             15883
## 51 2013             14958
## 52 2014             14393
## 53 2015             14463
#Store the data in a conveniently named variable
n.Passenger <- mydata_csv$Passengers..000s.
#Create line chart
plot(n.Passenger, type = "l")
#Change to Time-Series Object
#install.packages("zoo")
library(zoo)
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
Dulles <- ts(mydata_csv$Passengers..000s.,start = 1963, freq = 1)
head(Dulles)
## [1]  641  728  920 1079 1427 1602
#install.packages("forecast")
require("forecast")
## Loading required package: forecast
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
plot(Dulles)