BST430 Lecture 02-B

class: ur-title, center, middle, title-slide

.title[
# BST430 Lecture 02-B
]
.subtitle[
## Vectors, matrices and linear algebra
]
.author[
### Seong-Hwan Jun, based on the course by Andrew McDavid and Tanzy Love
]
.institute[
### U of Rochester
]
.date[
### 2021-09-27 (updated: 2025-09-18)
]

---

class: middle

## Vectors are everywhere

Your garden variety R object is a vector. Square brackets are used for isolating elements of a vector. This is often called __indexing__. Indexing begins at 1 in R, unlike many other languages that index from 0.

```r
x = 3 * 4
x
```

```
## [1] 12
```

```r
length(x)
```

```
## [1] 1
```

```r
x[2] = 100
x
```

```
## [1]  12 100
```

---

## R is very (probably too) forgiving with indexing.

```r
x[5] = 3
x
```

```
## [1]  12 100  NA  NA   3
```

```r
x[11]
```

```
## [1] NA
```

```r
x[0]
```

```
## numeric(0)
```

---

## Most functions are vectorized

When reading docs,look for arguments that can be vectors. For example, the mean and standard deviation of random normal variates can be provided as vectors.

```r
set.seed(2021)
rnorm(5, mean = 10^(1:5))
```

```
## [1]      9.88    100.55   1000.35  10000.36 100000.90
```

```r
rnorm(5, sd = 10^(1:5))
```

```
## [1]    -19.2     26.2    915.6    137.7 172996.3
```

`1:5` is shorthand for `c(1,2,3,4,5)`, and so on.  To generate more complicated sequences, see `seq(from, to, by, length.out)`.

---

## Vector arithmetic

Arithmetic operator apply to vectors in a "componentwise" fashion

```r
x = c(7, 8, 10, 20)
y = c(-7, -8, -10, -20)
x + y
```

```
## [1] 0 0 0 0
```

```r
x * y
```

```
## [1]  -49  -64 -100 -400
```

---

Can do componentwise comparisons with vectors:

```r
x > 9
```

```
## [1] FALSE FALSE  TRUE  TRUE
```

Logical operators also work elementwise:

```r
(x > 9) & (x < 20)
```

```
## [1] FALSE FALSE  TRUE FALSE
```

---

To compare whole vectors, best to use `identical()` or `all.equal()`:
.pull-left[
`identical()` and `==`

```r
x == -y
```

```
## [1] TRUE TRUE TRUE TRUE
```

```r
identical(x, -y)
```

```
## [1] TRUE
```

```r
u = c(0.5-0.3,0.3-0.1)
v = c(0.3-0.1,0.5-0.3)
identical(u,v)
```

```
## [1] FALSE
```

```r
identical(u,v[2:1])
```

```
## [1] TRUE
```
]
.pull-right[
`all.equal` and `near` allow for machine representation error in floating point values.

```r
all.equal(u, v)
```

```
## [1] TRUE
```

```r
all.equal(u,v, tolerance = 0)
```

```
## [1] "Mean relative difference: 1.387779e-16"
```

```r
near(u,v)
```

```
## [1] TRUE TRUE
```

]

---

## Vectorization: awesome, yet dangerous

Vectorization can be awesome but dangerous if you exploit it by mistake and get no warning.  While we're on the topic of awesome, yet dangerous:

.alert[R also recycles] vectors, if they are not the necessary length.

You will get a warning when the lengths is not an integer multiples of each another, but recycling is silent if it seems like you know what you're doing.

---

## Recycled with a warning

```r
(y = 1:3)
```

```
## [1] 1 2 3
```

```r
(z = 3:7)
```

```
## [1] 3 4 5 6 7
```

```r
y + z
```

```
## Warning in y + z: longer object length is not a multiple of
## shorter object length
```

```
## [1] 4 6 8 7 9
```

Hint: set `options(warn = 2)` to convert warnings to errors to catch this problem definitively.
---

## Recycled without warning

```r
(y = 1:10)
```

```
##  [1]  1  2  3  4  5  6  7  8  9 10
```

```r
(z = 3:7)
```

```
## [1] 3 4 5 6 7
```

```r
y + z
```

```
##  [1]  4  6  8 10 12  9 11 13 15 17
```

`1` is a vector, so this is also a form of recycling!

```r
z + 1
```

```
## [1] 4 5 6 7 8
```

---

## Making vectors

The combine function `c()` is your go-to function for making vectors.
(OMG, I was taught this was called concatenate.)

```r
str(c("hello", "world"))
```

```
##  chr [1:2] "hello" "world"
```

```r
str(c(1:3, 100, 150))
```

```
##  num [1:5] 1 2 3 100 150
```

---

Let's create some simple vectors for more demos below.

```r
n = 8
set.seed(1)
(w = round(rnorm(n), 2)) # numeric floating point
```

```
## [1] -0.63  0.18 -0.84  1.60  0.33 -0.82  0.49  0.74
```

```r
(x = 1:n) # numeric integer
```

```
## [1] 1 2 3 4 5 6 7 8
```

```r
(y = LETTERS[1:n]) # character
```

```
## [1] "A" "B" "C" "D" "E" "F" "G" "H"
```

```r
(z = runif(n) > 0.3) # logical
```

```
## [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE
```

---

## Indexing a vector

Square brackets are used to index a vector. There is great flexibility in what one can put inside the square brackets.

Common ways to index a vector:

* __logical vector__: keep elements associated with TRUE's, ditch the FALSE's
* __vector of positive integers__: specifying the keepers
* __vector of negative integers__: specifying the losers
* __character vector__: naming the keepers

---

## Names

```r
w
```

```
## [1] -0.63  0.18 -0.84  1.60  0.33 -0.82  0.49  0.74
```

```r
names(w) = letters[seq_along(w)]
w
```

```
##     a     b     c     d     e     f     g     h 
## -0.63  0.18 -0.84  1.60  0.33 -0.82  0.49  0.74
```

```r
w[c('a', 'b', 'd')]
```

```
##     a     b     d 
## -0.63  0.18  1.60
```

---

## Boolean vectors

```r
w < 0
```

```
##     a     b     c     d     e     f     g     h 
##  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE
```

```r
which(w < 0)
```

```
## a c f 
## 1 3 6
```

```r
w[w < 0]
```

```
##     a     c     f 
## -0.63 -0.84 -0.82
```

`which()` gives the elements of a Boolean vector that are `TRUE`, excludes .alert[NA].

---

## Integer vectors

```r
seq(from = 1, to = length(w), by = 2)
```

```
## [1] 1 3 5 7
```

```r
w[seq(from = 1, to = length(w), by = 2)]
```

```
##     a     c     e     g 
## -0.63 -0.84  0.33  0.49
```

```r
w[-c(2, 5)]
```

```
##     a     c     d     f     g     h 
## -0.63 -0.84  1.60 -0.82  0.49  0.74
```

```r
w[c('c', 'a', 'f')]
```

```
##     c     a     f 
## -0.84 -0.63 -0.82
```

---
class: middle

.hand[Lists...again]

---

## Lists...again

We have seen that lists before: the über-vector in R. It's got length, like a vector, but with no requirement that the elements be of the same type In data analysis, you won't make lists very often, at least not consciously. But

* data.frames are lists! They are a special case where each element is an vector, all having the same length.
* Many non-tidyverse functions return lists. You will want to extract goodies from them, such as the p-value for a hypothesis test or the estimated error variance in a regression model

---

## Lists...again

Using `list()` instead of `c()` to combine things and you'll notice that the different flavors of the constituent parts are retained this time.

```r
## earlier: a = c("cabbage", pi, TRUE, 4.3)
(a = list("cabbage", pi, TRUE, c(4.3,3,2.1,10)))
```

```
## [[1]]
## [1] "cabbage"
## 
## [[2]]
## [1] 3.141593
## 
## [[3]]
## [1] TRUE
## 
## [[4]]
## [1]  4.3  3.0  2.1 10.0
```

---

## Names in lists

List components can also have names. You can create or change names after a list already exists or in the initial assignment.
.pull-left[

```r
names(a)
```

```
## NULL
```

```r
names(a) = c("veg", "dessert", "my_aim", "number")
a
```

```
## $veg
## [1] "cabbage"
## 
## $dessert
## [1] 3.141593
## 
## $my_aim
## [1] TRUE
## 
## $number
## [1]  4.3  3.0  2.1 10.0
```
]
.pull-right[

```r
a = list(veg = "cabbage", dessert = pi, my_aim = TRUE, numbers = c(4.3,10))
a
```

```
## $veg
## [1] "cabbage"
## 
## $dessert
## [1] 3.141593
## 
## $my_aim
## [1] TRUE
## 
## $numbers
## [1]  4.3 10.0
```
]

---

## Indexing lists

Indexing a list is similar to indexing a vector but it is necessarily more complex.  If you request more than one element, you should and will get a list back.  But if you request a single element:
*  Do you want list of length 1 containing only that element? Use single square brackets, `[` and `]`.This is rarely desired...
*  Or do you want the element itself? Use a dollar sign `$`, or double square brackets, `[[` and `]]`.

The ["pepper shaker photos" in R for Data Science](https://r4ds.had.co.nz/vectors.html#lists-of-condiments) are a splendid visual explanation of the different ways to get stuff out of a list.

---

## More list indexing

```r
(a = list(veg = c("cabbage", "eggplant"),
           t_num = c(pi, exp(1), sqrt(2)),
           my_aim = TRUE,
           joe_num = 2:6))
```

```
## $veg
## [1] "cabbage"  "eggplant"
## 
## $t_num
## [1] 3.141593 2.718282 1.414214
## 
## $my_aim
## [1] TRUE
## 
## $joe_num
## [1] 2 3 4 5 6
```

A slightly more complicated list for demo purposes.

---

## Single, unlisted elements

```r
a[[2]] # index with a positive integer
```

```
## [1] 3.141593 2.718282 1.414214
```

```r
a$my_aim # use dollar sign and element name
```

```
## [1] TRUE
```

```r
a[["t_num"]] # index with length 1 character vector
```

```
## [1] 3.141593 2.718282 1.414214
```

---

## Single, unlisted elements

```r
i_want_this = "joe_num" # indexing with length 1 character object
a[[i_want_this]] # we get joe_num itself, a length 5 integer vector
```

```
## [1] 2 3 4 5 6
```

*  When the indexing object is an R object, prefer the double brackets.

---

## Double bracket only for single elements

```r
a[[c("joe_num", "veg")]] 
```

```
## Error in a[[c("joe_num", "veg")]]: subscript out of bounds
```

We get an error if we try to extract more than element with double brackets

---

## More than one element

```r
names(a)
```

```
## [1] "veg"     "t_num"   "my_aim"  "joe_num"
```

```r
str(a[c("t_num", "veg")]) # returns list of length 2
```

```
## List of 2
##  $ t_num: num [1:3] 3.14 2.72 1.41
##  $ veg  : chr [1:2] "cabbage" "eggplant"
```

```r
str(a["veg"])# returns list of length 1
```

```
## List of 1
##  $ veg: chr [1:2] "cabbage" "eggplant"
```

```r
length(a["veg"][[1]]) # contrast with length of the veg vector itself
```

```
## [1] 2
```

The return value will always be a list, even if you only request 1 element.

---

## A useful list

```r
lmcars = lm(speed~dist, data=cars)

lmcars[[1]]
```

```
## (Intercept)        dist 
##   8.2839056   0.1655676
```

```r
summary(lmcars)$sigma
```

```
## [1] 3.155753
```

```r
names(summary(lmcars))
```

```
##  [1] "call"          "terms"         "residuals"    
##  [4] "coefficients"  "aliased"       "sigma"        
##  [7] "df"            "r.squared"     "adj.r.squared"
## [10] "fstatistic"    "cov.unscaled"
```

---

## Creating a data.frame explicitly

In data analysis, we often import data into data.frame via `read_csv()`. But one can also construct a data.frame directly using `tibble()`.

```r
n = 8
(j_dat = data.frame(w = rnorm(n),
                x = 1:n,
                y = LETTERS[1:n],
                z = runif(n) > 0.3))
```

```
##             w x y     z
## 1 -0.62124058 1 A  TRUE
## 2 -2.21469989 2 B  TRUE
## 3  1.12493092 3 C  TRUE
## 4 -0.04493361 4 D  TRUE
## 5 -0.01619026 5 E  TRUE
## 6  0.94383621 6 F  TRUE
## 7  0.82122120 7 G FALSE
## 8  0.59390132 8 H  TRUE
```

---

## data.frames really are lists!

```r
is.list(j_dat) # data.frames are lists
```

```
## [1] TRUE
```

```r
j_dat[[4]] # this works but I prefer ...
```

```
## [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
```

```r
j_dat$z # using dollar sign and name, when possible
```

```
## [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
```

```r
namez=c("z")
j_dat[[namez]] # using a character vector of names
```

```
## [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
```

```r
#namez=c("z","w")
#j_dat[[namez]] # does not work: Error
```

---

## data.frames really are lists!

```r
str(j_dat[c("x", "z")]) # get multiple variables
```

```
## 'data.frame':	8 obs. of  2 variables:
##  $ x: int  1 2 3 4 5 6 7 8
##  $ z: logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
```

```r
head(select(j_dat, x, z), 4) # tidyverse version is better in interactive work
```

```
##   x    z
## 1 1 TRUE
## 2 2 TRUE
## 3 3 TRUE
## 4 4 TRUE
```

```r
identical(select(j_dat, x, z), j_dat[c("x", "z")])
```

```
## [1] TRUE
```

Coerce a list directly to a data.frame directly `as_tibble()`.

---
class: middle

.hand[Matrices are vectors with `dim`.]

---

## Matrices vs data frames

A matrix is a generalization of an atomic vector and the requirement that all the elements be of the same flavor still holds.

*  Data frames: default receptacle for rectangular data
*  But when we need to do linear algebra, we may want to use a matrix instead.  
*  Higher-order arrays are also available in R.  A matrix is an important special case having dimension 2.

---
class: code70

## Matrices

Let's make a simple matrix and give it decent row and column names, which we know is a good practice. You'll see familiar or self-explanatory functions below for getting to know a matrix.

```r
## don't worry if the construction of this matrix confuses you; 
## just focus on the product
j_mat = outer(as.character(1:4), as.character(1:5),
              function(x, y)  paste0('x', x, y) )
j_mat
```

```
##      [,1]  [,2]  [,3]  [,4]  [,5] 
## [1,] "x11" "x12" "x13" "x14" "x15"
## [2,] "x21" "x22" "x23" "x24" "x25"
## [3,] "x31" "x32" "x33" "x34" "x35"
## [4,] "x41" "x42" "x43" "x44" "x45"
```

```r
str(j_mat)
```

```
##  chr [1:4, 1:5] "x11" "x21" "x31" "x41" "x12" "x22" "x32" ...
```

---

## Useful matrix functions

```r
dim(j_mat)
```

```
## [1] 4 5
```

```r
length(j_mat)
```

```
## [1] 20
```

```r
nrow(j_mat)
```

```
## [1] 4
```

```r
ncol(j_mat)
```

```
## [1] 5
```

---

## Dimensions can have names

```r
rownames(j_mat)
```

```
## NULL
```

```r
rownames(j_mat) = str_c("row", seq_len(nrow(j_mat)))
colnames(j_mat) = str_c("col", seq_len(ncol(j_mat)))
dimnames(j_mat) # also useful for assignment
```

```
## [[1]]
## [1] "row1" "row2" "row3" "row4"
## 
## [[2]]
## [1] "col1" "col2" "col3" "col4" "col5"
```

```r
j_mat
```

```
##      col1  col2  col3  col4  col5 
## row1 "x11" "x12" "x13" "x14" "x15"
## row2 "x21" "x22" "x23" "x24" "x25"
## row3 "x31" "x32" "x33" "x34" "x35"
## row4 "x41" "x42" "x43" "x44" "x45"
```

---

## Indexing a matrix

```r
j_mat[2, 3]
```

```
## [1] "x23"
```

```r
j_mat[2, ] # getting row 2
```

```
##  col1  col2  col3  col4  col5 
## "x21" "x22" "x23" "x24" "x25"
```

```r
is.vector(j_mat[2, ]) # we get row 2 as an atomic vector
```

```
## [1] TRUE
```

```r
j_mat[ , 3, drop = FALSE] # getting column 3
```

```
##      col3 
## row1 "x13"
## row2 "x23"
## row3 "x33"
## row4 "x43"
```

```r
dim(j_mat[ , 3, drop = FALSE]) # we get column 3 as a 4 x 1 matrix
```

```
## [1] 4 1
```

---

## Use all of your favorite vector methods, too.

```r
j_mat[c("row1", "row4"), c("col2", "col3")]
```

```
##      col2  col3 
## row1 "x12" "x13"
## row4 "x42" "x43"
```

```r
j_mat[-c(2, 3), c(TRUE, TRUE, FALSE, FALSE)] # wacky but possible
```

```
##      col1  col2  col5 
## row1 "x11" "x12" "x15"
## row4 "x41" "x42" "x45"
```

---

## Indexing a matrix

In summary:
*  Use `[`, `]` and a logical, integer numeric (positive or negative), or character vector. 
*  The comma `,` to distinguishes rows and columns. 
*  The `$i,j$`-th element is the element at the intersection of row `$i$` and column `$j$` and is obtained with `j_mat[i, j]`. 
*  Request an entire row/column by leaving the associated index empty. 
* `drop = FALSE` preserves singleton dimensions.  Almost always  want this when programming with variable indices.

---

## R uses column major order

Under the hood, of course, matrices are just vectors with some extra facilities for indexing. R uses column-major order: the columns are stacked up one after the other. (Contrast to C and Python which use row-major order).

---

## Matrices are vectors !

Matrices can be indexed *exactly* like a vector, i.e. with no comma `$i,j$` business, like so:

```r
j_mat[7]
```

```
## [1] "x32"
```

```r
j_mat
```

```
##      col1  col2  col3  col4  col5 
## row1 "x11" "x12" "x13" "x14" "x15"
## row2 "x21" "x22" "x23" "x24" "x25"
## row3 "x31" "x32" "x33" "x34" "x35"
## row4 "x41" "x42" "x43" "x44" "x45"
```

How to understand this: start counting in the upper left corner, move down the column, continue  from the top of column 2 and you'll land on the element "x32" when you get to 7.

---

## Matrices are vectors!

Note also that one can put an indexed matrix on the receiving end of an assignment operation and, as long as your replacement values have valid shape or extent, you can change the matrix.

```r
j_mat["row1", 2:3] = c("HEY!", "THIS IS NUTS!")
j_mat
```

```
##      col1  col2   col3            col4  col5 
## row1 "x11" "HEY!" "THIS IS NUTS!" "x14" "x15"
## row2 "x21" "x22"  "x23"           "x24" "x25"
## row3 "x31" "x32"  "x33"           "x34" "x35"
## row4 "x41" "x42"  "x43"           "x44" "x45"
```

---
class: code90

## Recycling also works!

```r
norm_mat = matrix(rnorm(6), nrow = 3)
cbind(norm_mat, rep(1,3), rowMeans(norm_mat))
```

```
##             [,1]       [,2] [,3]       [,4]
## [1,]  0.61982575 -1.4707524    1 -0.4254633
## [2,] -0.05612874 -0.4781501    1 -0.2671394
## [3,] -0.15579551  0.4179416    1  0.1310730
```

Recycle over each entry in first column, each entry in second column

```r
(center_mat = norm_mat - rowMeans(norm_mat))
```

```
##            [,1]       [,2]
## [1,]  1.0452891 -1.0452891
## [2,]  0.2110107 -0.2110107
## [3,] -0.2868685  0.2868685
```

---

## Creating arrays, e.g. matrices

All matrix elements must be the same flavor. If that's not true, you risk an error or, worse, silent conversion to character.

To make a matrix:
* Fill with a vector
* Glue vectors together as rows or columns
* Or conversion from a data.frame

---

## Fill with a vector

```r
matrix(1:15, nrow = 5)
```

```
##      [,1] [,2] [,3]
## [1,]    1    6   11
## [2,]    2    7   12
## [3,]    3    8   13
## [4,]    4    9   14
## [5,]    5   10   15
```

```r
matrix(1:15, nrow = 5, byrow = TRUE)
```

```
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
## [4,]   10   11   12
## [5,]   13   14   15
```

* `reshape2::acast()`!

---

## Recycle a vector

```r
matrix(c("yo!", "foo?"), nrow = 3, ncol = 4)
```

```
##      [,1]   [,2]   [,3]   [,4]  
## [1,] "yo!"  "foo?" "yo!"  "foo?"
## [2,] "foo?" "yo!"  "foo?" "yo!" 
## [3,] "yo!"  "foo?" "yo!"  "foo?"
```

---

## Provide names

```r
matrix(1:15, nrow = 5,
       dimnames = list(paste0("row", 1:5),
                       paste0("col", 1:3)))
```

```
##      col1 col2 col3
## row1    1    6   11
## row2    2    7   12
## row3    3    8   13
## row4    4    9   14
## row5    5   10   15
```

---

## Bind columns

Here we create a matrix by binding vectors together. Watch the vector names propagate as row or column names.

```r
vec1 = 5:1
vec2 = 2^(1:5)
cbind(vec1, vec2)
```

```
##      vec1 vec2
## [1,]    5    2
## [2,]    4    4
## [3,]    3    8
## [4,]    2   16
## [5,]    1   32
```

---

## bind rows

```r
rbind(vec1, vec2)
```

```
##      [,1] [,2] [,3] [,4] [,5]
## vec1    5    4    3    2    1
## vec2    2    4    8   16   32
```

You may have also seen me use `bind_rows()` and `bind_cols()` -- 
these are analogous tidyverse functions that you will want to use when working with **data frames**--they don't work with matrices.
They have nicer defaults for data frames than `cbind()` and `rbind()`.

---

## From a data frame.

```r
(vecDat = tibble(vec1 = 5:1,
                vec2 = 2^(1:5)))
```

```
## # A tibble: 5 × 2
##    vec1  vec2
##   <int> <dbl>
## 1     5     2
## 2     4     4
## 3     3     8
## 4     2    16
## 5     1    32
```

```r
vecMat = as.matrix(vecDat)
str(vecMat)
```

```
##  num [1:5, 1:2] 5 4 3 2 1 2 4 8 16 32
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:2] "vec1" "vec2"
```

---

## From a data frame with silent coercion 🤦

```r
multiDat = tibble(vec1 = 5:1,
                  vec2 = paste0("hi", 1:5))
(multiMat = as.matrix(multiDat))
```

```
##      vec1 vec2 
## [1,] "5"  "hi1"
## [2,] "4"  "hi2"
## [3,] "3"  "hi3"
## [4,] "2"  "hi4"
## [5,] "1"  "hi5"
```

```r
# Hey! Where did that heading come from?
emo::ji("person_facepalming")
```

```
## 🤦
```

---
class: code70

## Matrix multiplication

Matrices have its own special multiplication operator, written `%*%`:

```r
(six_sevens = matrix(rep(7,6), ncol=3))
```

```
##      [,1] [,2] [,3]
## [1,]    7    7    7
## [2,]    7    7    7
```

```r
(z_mat = matrix(c(40,1,60,3), nrow=2))
```

```
##      [,1] [,2]
## [1,]   40   60
## [2,]    1    3
```

```r
z_mat %*% six_sevens # [2x2] * [2x3]
```

```
##      [,1] [,2] [,3]
## [1,]  700  700  700
## [2,]   28   28   28
```

---

## Rowwise/columnwise manipulations

*  `rowSums()` `rowMeans()`
*  `colSums()` `colMeans()`
*  many more in `matrixStats`
*  roll your own with `apply(<MATRIX>, <1|2>, <FUN>)`
  * Use `1` for rows, `2` for columns

---

## rowSums vs apply

```r
rowSums(z_mat)
```

```
## [1] 100   4
```

```r
apply(z_mat, 1, sum)
```

```
## [1] 100   4
```

---

## Matrix diagonal

The `diag()` function can be used to extract the diagonal entries of a matrix:

```r
diag(z_mat)
```

```
## [1] 40  3
```

It can also replace  the diagonal:

```r
diag(z_mat) = c(35,4)
z_mat
```

```
##      [,1] [,2]
## [1,]   35   60
## [2,]    1    4
```

---

## Creating a diagonal matrix

Finally, `diag()` can be used to create a diagonal matrix:

```r
diag(c(3,4))
```

```
##      [,1] [,2]
## [1,]    3    0
## [2,]    0    4
```

```r
diag(2)
```

```
##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1
```

---

## Other matrix operators

**Transpose**:

```r
t(z_mat)
```

```
##      [,1] [,2]
## [1,]   35    1
## [2,]   60    4
```

**Determinant**:

```r
det(z_mat)
```

```
## [1] 80
```

---

## Other matrix operators

**Inverse**:

```r
solve(z_mat)
```

```
##         [,1]    [,2]
## [1,]  0.0500 -0.7500
## [2,] -0.0125  0.4375
```

```r
z_mat %*% solve(z_mat)
```

```
##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1
```

---

## Putting it all together...implications for data.frames

Hopefully the slog through vectors, matrices, and lists will be redeemed by greater prowess at data analysis. Consider:

* a data.frame is a *list*
* the list elements are the variables and they are *atomic vectors*
* data.frames are rectangular, like their matrix friends, so your intuition -- and even some syntax -- can be borrowed from the matrix world

.alert[A data.frame is a list that quacks like a matrix.]

---

## Reviewing list-style indexing of a data.frame

```r
j_dat
```

```
##             w x y     z
## 1 -0.62124058 1 A  TRUE
## 2 -2.21469989 2 B  TRUE
## 3  1.12493092 3 C  TRUE
## 4 -0.04493361 4 D  TRUE
## 5 -0.01619026 5 E  TRUE
...
```

```r
j_dat$z
```

```
## [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
```

```r
i_want_this = "z" 
(j_dat[[i_want_this]]) # atomic
```

```
## [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
```

---

## Reviewing vector-style indexing of a data.frame:

```r
j_dat["y"]
```

```
##   y
## 1 A
## 2 B
## 3 C
## 4 D
## 5 E
...
```

```r
i_want_this = c("w", "z")
j_dat[i_want_this] # index with a vector of variable names
```

```
##             w     z
## 1 -0.62124058  TRUE
## 2 -2.21469989  TRUE
## 3  1.12493092  TRUE
## 4 -0.04493361  TRUE
## 5 -0.01619026  TRUE
...
```

---

## Demonstrating matrix-style indexing of a data.frame:

```r
j_dat[ , "z", drop = FALSE]
```

```
##       z
## 1  TRUE
## 2  TRUE
## 3  TRUE
## 4  TRUE
## 5  TRUE
...
```

```r
j_dat[ , "z", drop = TRUE]
```

```
## [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
```

---
## Demonstrating matrix-style indexing of a data.frame:

```r
j_dat[c(2, 4, 7), c(1, 4)] # awful and arbitrary but syntax works
```

```
##             w     z
## 2 -2.21469989  TRUE
## 4 -0.04493361  TRUE
## 7  0.82122120 FALSE
```

```r
j_dat[j_dat$z, ]
```

```
##             w x y    z
## 1 -0.62124058 1 A TRUE
## 2 -2.21469989 2 B TRUE
## 3  1.12493092 3 C TRUE
## 4 -0.04493361 4 D TRUE
## 5 -0.01619026 5 E TRUE
...
```

---

## Recap

- Elemental data types
    + `logical`, `integer`, `numeric`, `complex`, `character`
- Compound data types
    + `class`, `attributes`
- Data structures
    + `vector`, `list`, `data.frame`, `matrix`, `array`
- Be careful about data types / classes
    + Sometimes `R` makes silly assumptions about your data class 
    + If a plot/output is not behaving the way you expect, first
    investigate the data class with `str`

---

## Acknowledgments

Based off of [materials from](https://www.stat.cmu.edu/~ryantibs/statcomp/lectures/intro.html) Ryan Tibshirani "Statistical Computing" at CMU and
[Stat 545](https://stat545.com/r-objects.html) at UBC.

More reading: [r4ds chapter 20](https://r4ds.had.co.nz/vectors.html).