Download and Installation

R Studio

RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.

It’s open-source software. (choose your OS version to download: https://www.rstudio.com/products/rstudio/download/)

When you open Rstudio, it looks like this:

Rstudio

If you would like to change background color to dark theme, you may configure by clicking (Tools > Global Options…> Appearance > Editor theme: Monaco)

Settings

Rstudio in Dark Mode

As you can see from the interface above, there are three panels showing. The left hand panel is called console, it is where you run any command and see the output. The upper right panel is where RStudio shows data you have loaded, variables that have been defined, and other objects. The lower right panel is where the Files/Plots/Packages…etc., can be found.

Generally, you may want to code in editor rather than in console directly. When you click the green plus icon in the top-left corner, and select R script, you will find left panel split into two parts and a new script editor window shows up.

Now it’s time to get your hands dirty following the instructions and practice on the fundamental R commands.

Fundamental R Commands

Before Coding

-Set your working directory

  • Always set the working directory before you begin coding.

  • Working directory is the folder where you may load data, save output, and save the code.

#You can use getwd() to check your current working directory.  
getwd()
## [1] "/Users/woshiamie/Desktop/UC/Teaching/BANA4090_23Summer/Module 1"

You can set your working directory by using setwd(“path”) command or manually clicking Session > Set Working Directory > Choose Directory…, or use Ctrl + Shift + H and choose the folder that you want to set as working directory.

-Using R as a Calculator

Assign values to an object

Type the following code in the editor and run it line by line. To run a line of code, you can move the cursor to that line and use Ctrl + Enter (Cmd+Enter for Mac). If you want to run multiple lines of code, simply highlight those lines and use the same command.

“<-” and “=” are both assignment operator which assigns the right-hand side value to the left-hand side object.

x <- 88
y = 0.8
x * y + x / y
## [1] 180.4

Try these additional functions:

log(x); exp(x/y); sin(x); cos(y); sqrt(y)

[Exercise]

\(E=mc^2\), equation in German-born physicist Albert Einstein’s theory of Special relativity that expresses the fact that mass and energy are the same physical entity and can be changed into each other. How much energy can 1 kg matter (any matter) be turned into?

*\(m = 1 kg\): mass of matter

  • \(c = 3 * 108 ms^{-1}\): speed of light

  • \(E = ?\)

-Logical Operaitons

Logical operaitons return TRUE (T) or FALSE (F)

x == y
## [1] FALSE
x > y
## [1] TRUE
2 >= 2
## [1] TRUE
2 > 2
## [1] FALSE

And: &
Or: |
All: all()
Any: any()

TRUE & TRUE
## [1] TRUE
TRUE & FALSE
## [1] FALSE
FALSE & FALSE
## [1] FALSE
TRUE | TRUE
## [1] TRUE
TRUE | FALSE
## [1] TRUE
FALSE | FALSE
## [1] FALSE
all(c(T,T,T,T,F))
## [1] FALSE
all(c(T,T,T,T,T))
## [1] TRUE
any(c(T,T,T,T,F))
## [1] TRUE
any(c(F,F,F,F,F))
## [1] FALSE

Install and Library packages

You can you install install.packages(“Package_name”) to install packages. After installation, you can use library() function to attach the package you want to use.

install.packages("readxl")

Data Structure

There are four types of data structures in R

  • Vector
  • Matrix
  • Data frame
  • List

-Vector

A vector is the simplest type of data structure in R. Simply put, a vector is a sequence of data elements of the same basic type.

To assign a list of numbers (vector) to a variable, the numbers within the c command (combine) are separated by commas. For example, we can create a new variable called “p” which will contain the numbers 0.1, 0.3, 0.5 and 0.7:

# Define numerical vector p
p <- c(0.1, 0.3, 0.5, 0.7)
# Define character vector pp
pp <- c("IBM", "Microsoft", "Apple", "AMD")

Note that you can put a # at the beginning of a line to write comments in your code.

- Calculations with a Numerical Vector

# Average
mean(p)
## [1] 0.4
# Standard deviation
sd(p)
## [1] 0.2581989
# Median
median(p)
## [1] 0.4
# Maximum
max(p)
## [1] 0.7
# Minimum
min(p)
## [1] 0.1
# Summary statistics
summary(p)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.10    0.25    0.40    0.40    0.55    0.70

- Calculations for Multiple Vectors

# Define vector p1
p1 <- c(1,3,5,7)

Elementwise operations (vectors must be the same length).

p + p1
## [1] 1.1 3.3 5.5 7.7
p * p1
## [1] 0.1 0.9 2.5 4.9
p + 2
## [1] 2.1 2.3 2.5 2.7
p / 10
## [1] 0.01 0.03 0.05 0.07

A vector of multiple vectors is still a vector.

p2 <- c(p, p1)
p2
## [1] 0.1 0.3 0.5 0.7 1.0 3.0 5.0 7.0

- Indexing

To extract the second entry from vector p2:

p2[2]
## [1] 0.3

To extract all elements greater than 5 from vector p2:

p2[p2 > 5]
## [1] 7

To extract all elements greater than 5 and less than 7 from vector p2:

p2[p2 > 5 & p2 < 7]
## numeric(0)
# there's no such an element

To order the vector p2 from smallest to largest:

p2[order(p2)]
## [1] 0.1 0.3 0.5 0.7 1.0 3.0 5.0 7.0
sort(p2)
## [1] 0.1 0.3 0.5 0.7 1.0 3.0 5.0 7.0
length(p)
## [1] 4
# All but the first element in p
p[-1]
## [1] 0.3 0.5 0.7

Matrix

Matrix is a collection of elements of the same data type (numeric, character, or logical) arranged into a fixed number of rows and columns.

Create a matrix using the matrix() function. This function creates a matrix from a given vector.

#Define a matrix A
X <- matrix(data = p2, nrow = 2, ncol = 4)
X
##      [,1] [,2] [,3] [,4]
## [1,]  0.1  0.5    1    5
## [2,]  0.3  0.7    3    7
class(X)
## [1] "matrix" "array"

In R functions, you may ignore the argument names and simply list the inputs in the correct order.

X <- matrix(p2, 2, 4)

The default order to position the values of a vector in a matrix is by column, but you can specify it as by row using an additional argument byrow=T.

X <- matrix(data = p2, nrow = 2, ncol = 4, byrow = T )

Create a matrix using cbind() and rbind() functions

cbind() is a function that combines vectors or matrices to a matrix by column.

X <- cbind(p, p1)
X
##        p p1
## [1,] 0.1  1
## [2,] 0.3  3
## [3,] 0.5  5
## [4,] 0.7  7
X <- rbind(p, p1)

- Matrix calculation

Dimensions

dim(X)
## [1] 2 4

Elementwise operations for matrices

X + 1
##    [,1] [,2] [,3] [,4]
## p   1.1  1.3  1.5  1.7
## p1  2.0  4.0  6.0  8.0
X * 2
##    [,1] [,2] [,3] [,4]
## p   0.2  0.6    1  1.4
## p1  2.0  6.0   10 14.0

Subsetting, indexing

# The element at the intersection of the second row and the second column
X[2,2]
## p1 
##  3
# The first row of matrix X
X[1, ]
## [1] 0.1 0.3 0.5 0.7
X[, 2:3]
##    [,1] [,2]
## p   0.3  0.5
## p1  3.0  5.0
X[1, c(1,3)]
## [1] 0.1 0.5

Transpose and Multiplication

# Transpose
t(X)
##        p p1
## [1,] 0.1  1
## [2,] 0.3  3
## [3,] 0.5  5
## [4,] 0.7  7
# Matrix multiplication
t(X) %*% X
##      [,1]  [,2]  [,3]  [,4]
## [1,] 1.01  3.03  5.05  7.07
## [2,] 3.03  9.09 15.15 21.21
## [3,] 5.05 15.15 25.25 35.35
## [4,] 7.07 21.21 35.35 49.49

Data Frame

A data frame is a table where each row represents an observation and each column represents a variable. A data frame has column names (variable names) and row names.

- Convert a Matrix to a Data Frame

The function data.frame(X) converts the matrix X into a data frame.

df <- data.frame(X)
class(df)
## [1] "data.frame"
df
##     X1  X2  X3  X4
## p  0.1 0.3 0.5 0.7
## p1 1.0 3.0 5.0 7.0

- Read External Data Files (.txt and .csv files)

Use read.table() or read.csv() functions to import comma/space/tab delimited text files. You can also use the Import Dataset Wizard in RStudio (File > Import Dataset…). The package “readxl” allows you to read xls/xlsx files as well.

First, download the Dulles_2.csv file from Canvas and save it in your working directory.

mydata_csv <- read.csv("Dulles_2.csv", header = T)

List

A list is a container. You can put different types of objects into a list.

mylist <- list(myvector = p, mymatrix = X, mydata = mydata_csv)
mylist
## $myvector
## [1] 0.1 0.3 0.5 0.7
## 
## $mymatrix
##    [,1] [,2] [,3] [,4]
## p   0.1  0.3  0.5  0.7
## p1  1.0  3.0  5.0  7.0
## 
## $mydata
##    Year Passengers..000s.
## 1  1963               641
## 2  1964               728
## 3  1965               920
## 4  1966              1079
## 5  1967              1427
## 6  1968              1602
## 7  1969              1928
## 8  1970              1869
## 9  1971              1881
## 10 1972              1992
## 11 1973              2083
## 12 1974              2004
## 13 1975              2000
## 14 1976              2251
## 15 1977              2267
## 16 1978              2518
## 17 1979              2858
## 18 1980              2086
## 19 1981              1889
## 20 1982              2248
## 21 1983              2651
## 22 1984              3136
## 23 1985              4538
## 24 1986              8394
## 25 1987              9980
## 26 1988              8650
## 27 1989              9224
## 28 1990              9043
## 29 1991              9406
## 30 1992              9408
## 31 1993              8501
## 32 1994              8947
## 33 1995              9653
## 34 1996             10095
## 35 1997             10697
## 36 1998             12445
## 37 1999             16055
## 38 2000             15873
## 39 2001             14021
## 40 2002             13146
## 41 2003             12928
## 42 2004             18213
## 43 2005             22129
## 44 2006             17787
## 45 2007             18792
## 46 2008             17638
## 47 2009             16964
## 48 2010             17214
## 49 2011             16663
## 50 2012             15883
## 51 2013             14958
## 52 2014             14393
## 53 2015             14463

Time Series Plotting

#Store the data in a conveniently named variable
n.Passenger <- mydata_csv$Passengers..000s.
#Create line chart
plot(n.Passenger, type = "l")

#Change to Time-Series Object
#install.packages("zoo")
library(zoo)
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
Dulles <- ts(mydata_csv$Passengers..000s.,start = 1963, freq = 1)
head(Dulles)
## [1]  641  728  920 1079 1427 1602
#install.packages("forecast")
require("forecast")
## Loading required package: forecast
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
plot(Dulles)