class: ur-title, center, middle, title-slide .title[ # BST430 Lecture 02-B ] .subtitle[ ## Vectors, matrices and linear algebra ] .author[ ### Seong-Hwan Jun, based on the course by Andrew McDavid ] .institute[ ### U of Rochester ] .date[ ### 2021-09-27 (updated: 2025-08-28) ] --- class: middle ## Vectors are everywhere Your garden variety R object is a vector. Square brackets are used for isolating elements of a vector. This is often called __indexing__. Indexing begins at 1 in R, unlike many other languages that index from 0. ```r x = 3 * 4 x ``` ``` ## [1] 12 ``` ```r length(x) ``` ``` ## [1] 1 ``` ```r x[2] = 100 x ``` ``` ## [1] 12 100 ``` --- ## R is very (probably too) forgiving with indexing. ```r x[5] = 3 x ``` ``` ## [1] 12 100 NA NA 3 ``` ```r x[11] ``` ``` ## [1] NA ``` ```r x[0] ``` ``` ## numeric(0) ``` --- ## Most functions are vectorized When reading docs,look for arguments that can be vectors. For example, the mean and standard deviation of random normal variates can be provided as vectors. ```r set.seed(2021) rnorm(5, mean = 10^(1:5)) ``` ``` ## [1] 9.88 100.55 1000.35 10000.36 100000.90 ``` ```r rnorm(5, sd = 10^(1:5)) ``` ``` ## [1] -19.2 26.2 915.6 137.7 172996.3 ``` `1:5` is shorthand for `c(1,2,3,4,5)`, and so on. To generate more complicated sequences, see `seq(from, to, by, length.out)`. --- ## Vector arithmetic Arithmetic operator apply to vectors in a "componentwise" fashion ```r x = c(7, 8, 10, 20) y = c(-7, -8, -10, -20) x + y ``` ``` ## [1] 0 0 0 0 ``` ```r x * y ``` ``` ## [1] -49 -64 -100 -400 ``` --- Can do componentwise comparisons with vectors: ```r x > 9 ``` ``` ## [1] FALSE FALSE TRUE TRUE ``` Logical operators also work elementwise: ```r (x > 9) & (x < 20) ``` ``` ## [1] FALSE FALSE TRUE FALSE ``` --- To compare whole vectors, best to use `identical()` or `all.equal()`: .pull-left[ `identical()` and `==` ```r x == -y ``` ``` ## [1] TRUE TRUE TRUE TRUE ``` ```r identical(x, -y) ``` ``` ## [1] TRUE ``` ```r u = c(0.5-0.3,0.3-0.1) v = c(0.3-0.1,0.5-0.3) identical(u,v) ``` ``` ## [1] FALSE ``` ```r identical(u,v[2:1]) ``` ``` ## [1] TRUE ``` ] .pull-right[ `all.equal` and `near` allow for machine representation error in floating point values. ```r all.equal(u, v) ``` ``` ## [1] TRUE ``` ```r all.equal(u,v, tolerance = 0) ``` ``` ## [1] "Mean relative difference: 1.387779e-16" ``` ```r near(u,v) ``` ``` ## [1] TRUE TRUE ``` ] --- ## Vectorization: awesome, yet dangerous Vectorization can be awesome but dangerous if you exploit it by mistake and get no warning. While we're on the topic of awesome, yet dangerous: .alert[R also recycles] vectors, if they are not the necessary length. You will get a warning when the lengths is not an integer multiples of each another, but recycling is silent if it seems like you know what you're doing. --- ## Recycled with a warning ```r (y = 1:3) ``` ``` ## [1] 1 2 3 ``` ```r (z = 3:7) ``` ``` ## [1] 3 4 5 6 7 ``` ```r y + z ``` ``` ## Warning in y + z: longer object length is not a multiple of ## shorter object length ``` ``` ## [1] 4 6 8 7 9 ``` Hint: set `options(warn = 2)` to convert warnings to errors to catch this problem definitively. --- ## Recycled without warning ```r (y = 1:10) ``` ``` ## [1] 1 2 3 4 5 6 7 8 9 10 ``` ```r (z = 3:7) ``` ``` ## [1] 3 4 5 6 7 ``` ```r y + z ``` ``` ## [1] 4 6 8 10 12 9 11 13 15 17 ``` `1` is a vector, so this is also a form of recycling! ```r z + 1 ``` ``` ## [1] 4 5 6 7 8 ``` --- ## Making vectors The combine function `c()` is your go-to function for making vectors. (OMG, I was taught this was called concatenate.) ```r str(c("hello", "world")) ``` ``` ## chr [1:2] "hello" "world" ``` ```r str(c(1:3, 100, 150)) ``` ``` ## num [1:5] 1 2 3 100 150 ``` --- Let's create some simple vectors for more demos below. ```r n = 8 set.seed(1) (w = round(rnorm(n), 2)) # numeric floating point ``` ``` ## [1] -0.63 0.18 -0.84 1.60 0.33 -0.82 0.49 0.74 ``` ```r (x = 1:n) # numeric integer ``` ``` ## [1] 1 2 3 4 5 6 7 8 ``` ```r (y = LETTERS[1:n]) # character ``` ``` ## [1] "A" "B" "C" "D" "E" "F" "G" "H" ``` ```r (z = runif(n) > 0.3) # logical ``` ``` ## [1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE ``` --- ## Indexing a vector Square brackets are used to index a vector. There is great flexibility in what one can put inside the square brackets. Common ways to index a vector: * __logical vector__: keep elements associated with TRUE's, ditch the FALSE's * __vector of positive integers__: specifying the keepers * __vector of negative integers__: specifying the losers * __character vector__: naming the keepers --- ## Names ```r w ``` ``` ## [1] -0.63 0.18 -0.84 1.60 0.33 -0.82 0.49 0.74 ``` ```r names(w) = letters[seq_along(w)] w ``` ``` ## a b c d e f g h ## -0.63 0.18 -0.84 1.60 0.33 -0.82 0.49 0.74 ``` ```r w[c('a', 'b', 'd')] ``` ``` ## a b d ## -0.63 0.18 1.60 ``` --- ## Boolean vectors ```r w < 0 ``` ``` ## a b c d e f g h ## TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE ``` ```r which(w < 0) ``` ``` ## a c f ## 1 3 6 ``` ```r w[w < 0] ``` ``` ## a c f ## -0.63 -0.84 -0.82 ``` `which()` gives the elements of a Boolean vector that are `TRUE`, excludes .alert[NA]. --- ## Integer vectors ```r seq(from = 1, to = length(w), by = 2) ``` ``` ## [1] 1 3 5 7 ``` ```r w[seq(from = 1, to = length(w), by = 2)] ``` ``` ## a c e g ## -0.63 -0.84 0.33 0.49 ``` ```r w[-c(2, 5)] ``` ``` ## a c d f g h ## -0.63 -0.84 1.60 -0.82 0.49 0.74 ``` ```r w[c('c', 'a', 'f')] ``` ``` ## c a f ## -0.84 -0.63 -0.82 ``` --- class: middle .hand[Lists...again] --- ## Lists...again We have seen that lists before: the über-vector in R. It's got length, like a vector, but with no requirement that the elements be of the same type In data analysis, you won't make lists very often, at least not consciously. But * data.frames are lists! They are a special case where each element is an vector, all having the same length. * Many non-tidyverse functions return lists. You will want to extract goodies from them, such as the p-value for a hypothesis test or the estimated error variance in a regression model --- ## Lists...again Using `list()` instead of `c()` to combine things and you'll notice that the different flavors of the constituent parts are retained this time. ```r ## earlier: a = c("cabbage", pi, TRUE, 4.3) (a = list("cabbage", pi, TRUE, c(4.3,3,2.1,10))) ``` ``` ## [[1]] ## [1] "cabbage" ## ## [[2]] ## [1] 3.141593 ## ## [[3]] ## [1] TRUE ## ## [[4]] ## [1] 4.3 3.0 2.1 10.0 ``` --- ## Names in lists List components can also have names. You can create or change names after a list already exists or in the initial assignment. .pull-left[ ```r names(a) ``` ``` ## NULL ``` ```r names(a) = c("veg", "dessert", "my_aim", "number") a ``` ``` ## $veg ## [1] "cabbage" ## ## $dessert ## [1] 3.141593 ## ## $my_aim ## [1] TRUE ## ## $number ## [1] 4.3 3.0 2.1 10.0 ``` ] .pull-right[ ```r a = list(veg = "cabbage", dessert = pi, my_aim = TRUE, numbers = c(4.3,10)) a ``` ``` ## $veg ## [1] "cabbage" ## ## $dessert ## [1] 3.141593 ## ## $my_aim ## [1] TRUE ## ## $numbers ## [1] 4.3 10.0 ``` ] --- ## Indexing lists Indexing a list is similar to indexing a vector but it is necessarily more complex. If you request more than one element, you should and will get a list back. But if you request a single element: * Do you want list of length 1 containing only that element? Use single square brackets, `[` and `]`.This is rarely desired... * Or do you want the element itself? Use a dollar sign `$`, or double square brackets, `[[` and `]]`. The ["pepper shaker photos" in R for Data Science](https://r4ds.had.co.nz/vectors.html#lists-of-condiments) are a splendid visual explanation of the different ways to get stuff out of a list. --- ## More list indexing ```r (a = list(veg = c("cabbage", "eggplant"), t_num = c(pi, exp(1), sqrt(2)), my_aim = TRUE, joe_num = 2:6)) ``` ``` ## $veg ## [1] "cabbage" "eggplant" ## ## $t_num ## [1] 3.141593 2.718282 1.414214 ## ## $my_aim ## [1] TRUE ## ## $joe_num ## [1] 2 3 4 5 6 ``` A slightly more complicated list for demo purposes. --- ## Single, unlisted elements ```r a[[2]] # index with a positive integer ``` ``` ## [1] 3.141593 2.718282 1.414214 ``` ```r a$my_aim # use dollar sign and element name ``` ``` ## [1] TRUE ``` ```r a[["t_num"]] # index with length 1 character vector ``` ``` ## [1] 3.141593 2.718282 1.414214 ``` --- ## Single, unlisted elements ```r i_want_this = "joe_num" # indexing with length 1 character object a[[i_want_this]] # we get joe_num itself, a length 5 integer vector ``` ``` ## [1] 2 3 4 5 6 ``` * When the indexing object is an R object, prefer the double brackets. --- ## Double bracket only for single elements ```r a[[c("joe_num", "veg")]] ``` ``` ## Error in a[[c("joe_num", "veg")]]: subscript out of bounds ``` We get an error if we try to extract more than element with double brackets --- ## More than one element ```r names(a) ``` ``` ## [1] "veg" "t_num" "my_aim" "joe_num" ``` ```r str(a[c("t_num", "veg")]) # returns list of length 2 ``` ``` ## List of 2 ## $ t_num: num [1:3] 3.14 2.72 1.41 ## $ veg : chr [1:2] "cabbage" "eggplant" ``` ```r str(a["veg"])# returns list of length 1 ``` ``` ## List of 1 ## $ veg: chr [1:2] "cabbage" "eggplant" ``` ```r length(a["veg"][[1]]) # contrast with length of the veg vector itself ``` ``` ## [1] 2 ``` The return value will always be a list, even if you only request 1 element. --- ## A useful list ```r lmcars = lm(speed~dist, data=cars) lmcars[[1]] ``` ``` ## (Intercept) dist ## 8.2839056 0.1655676 ``` ```r summary(lmcars)$sigma ``` ``` ## [1] 3.155753 ``` ```r names(summary(lmcars)) ``` ``` ## [1] "call" "terms" "residuals" ## [4] "coefficients" "aliased" "sigma" ## [7] "df" "r.squared" "adj.r.squared" ## [10] "fstatistic" "cov.unscaled" ``` --- ## Creating a data.frame explicitly In data analysis, we often import data into data.frame via `read_csv()`. But one can also construct a data.frame directly using `tibble()`. ```r n = 8 (j_dat = data.frame(w = rnorm(n), x = 1:n, y = LETTERS[1:n], z = runif(n) > 0.3)) ``` ``` ## w x y z ## 1 -0.62124058 1 A TRUE ## 2 -2.21469989 2 B TRUE ## 3 1.12493092 3 C TRUE ## 4 -0.04493361 4 D TRUE ## 5 -0.01619026 5 E TRUE ## 6 0.94383621 6 F TRUE ## 7 0.82122120 7 G FALSE ## 8 0.59390132 8 H TRUE ``` --- ## data.frames really are lists! ```r is.list(j_dat) # data.frames are lists ``` ``` ## [1] TRUE ``` ```r j_dat[[4]] # this works but I prefer ... ``` ``` ## [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE ``` ```r j_dat$z # using dollar sign and name, when possible ``` ``` ## [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE ``` ```r namez=c("z") j_dat[[namez]] # using a character vector of names ``` ``` ## [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE ``` ```r #namez=c("z","w") #j_dat[[namez]] # does not work: Error ``` --- ## data.frames really are lists! ```r str(j_dat[c("x", "z")]) # get multiple variables ``` ``` ## 'data.frame': 8 obs. of 2 variables: ## $ x: int 1 2 3 4 5 6 7 8 ## $ z: logi TRUE TRUE TRUE TRUE TRUE TRUE ... ``` ```r head(select(j_dat, x, z), 4) # tidyverse version is better in interactive work ``` ``` ## x z ## 1 1 TRUE ## 2 2 TRUE ## 3 3 TRUE ## 4 4 TRUE ``` ```r identical(select(j_dat, x, z), j_dat[c("x", "z")]) ``` ``` ## [1] TRUE ``` Coerce a list directly to a data.frame directly `as_tibble()`. --- class: middle .hand[Matrices are vectors with `dim`.] --- ## Matrices vs data frames A matrix is a generalization of an atomic vector and the requirement that all the elements be of the same flavor still holds. * Data frames: default receptacle for rectangular data * But when we need to do linear algebra, we may want to use a matrix instead. * Higher-order arrays are also available in R. A matrix is an important special case having dimension 2. --- class: code70 ## Matrices Let's make a simple matrix and give it decent row and column names, which we know is a good practice. You'll see familiar or self-explanatory functions below for getting to know a matrix. ```r ## don't worry if the construction of this matrix confuses you; ## just focus on the product j_mat = outer(as.character(1:4), as.character(1:5), function(x, y) paste0('x', x, y) ) j_mat ``` ``` ## [,1] [,2] [,3] [,4] [,5] ## [1,] "x11" "x12" "x13" "x14" "x15" ## [2,] "x21" "x22" "x23" "x24" "x25" ## [3,] "x31" "x32" "x33" "x34" "x35" ## [4,] "x41" "x42" "x43" "x44" "x45" ``` ```r str(j_mat) ``` ``` ## chr [1:4, 1:5] "x11" "x21" "x31" "x41" "x12" "x22" "x32" ... ``` --- ## Useful matrix functions ```r dim(j_mat) ``` ``` ## [1] 4 5 ``` ```r length(j_mat) ``` ``` ## [1] 20 ``` ```r nrow(j_mat) ``` ``` ## [1] 4 ``` ```r ncol(j_mat) ``` ``` ## [1] 5 ``` --- ## Dimensions can have names ```r rownames(j_mat) ``` ``` ## NULL ``` ```r rownames(j_mat) = str_c("row", seq_len(nrow(j_mat))) colnames(j_mat) = str_c("col", seq_len(ncol(j_mat))) dimnames(j_mat) # also useful for assignment ``` ``` ## [[1]] ## [1] "row1" "row2" "row3" "row4" ## ## [[2]] ## [1] "col1" "col2" "col3" "col4" "col5" ``` ```r j_mat ``` ``` ## col1 col2 col3 col4 col5 ## row1 "x11" "x12" "x13" "x14" "x15" ## row2 "x21" "x22" "x23" "x24" "x25" ## row3 "x31" "x32" "x33" "x34" "x35" ## row4 "x41" "x42" "x43" "x44" "x45" ``` --- ## Indexing a matrix ```r j_mat[2, 3] ``` ``` ## [1] "x23" ``` ```r j_mat[2, ] # getting row 2 ``` ``` ## col1 col2 col3 col4 col5 ## "x21" "x22" "x23" "x24" "x25" ``` ```r is.vector(j_mat[2, ]) # we get row 2 as an atomic vector ``` ``` ## [1] TRUE ``` ```r j_mat[ , 3, drop = FALSE] # getting column 3 ``` ``` ## col3 ## row1 "x13" ## row2 "x23" ## row3 "x33" ## row4 "x43" ``` ```r dim(j_mat[ , 3, drop = FALSE]) # we get column 3 as a 4 x 1 matrix ``` ``` ## [1] 4 1 ``` --- ## Use all of your favorite vector methods, too. ```r j_mat[c("row1", "row4"), c("col2", "col3")] ``` ``` ## col2 col3 ## row1 "x12" "x13" ## row4 "x42" "x43" ``` ```r j_mat[-c(2, 3), c(TRUE, TRUE, FALSE, FALSE)] # wacky but possible ``` ``` ## col1 col2 col5 ## row1 "x11" "x12" "x15" ## row4 "x41" "x42" "x45" ``` --- ## Indexing a matrix In summary: * Use `[`, `]` and a logical, integer numeric (positive or negative), or character vector. * The comma `,` to distinguishes rows and columns. * The `\(i,j\)`-th element is the element at the intersection of row `\(i\)` and column `\(j\)` and is obtained with `j_mat[i, j]`. * Request an entire row/column by leaving the associated index empty. * `drop = FALSE` preserves singleton dimensions. Almost always want this when programming with variable indices. --- ## R uses column major order Under the hood, of course, matrices are just vectors with some extra facilities for indexing. R uses column-major order: the columns are stacked up one after the other. (Contrast to C and Python which use row-major order). <img src="l02b-linear-algebra-indexing/img/major-order.png" width="60%" style="display: block; margin: auto;" /> --- ## Matrices are vectors ! Matrices can be indexed *exactly* like a vector, i.e. with no comma `\(i,j\)` business, like so: ```r j_mat[7] ``` ``` ## [1] "x32" ``` ```r j_mat ``` ``` ## col1 col2 col3 col4 col5 ## row1 "x11" "x12" "x13" "x14" "x15" ## row2 "x21" "x22" "x23" "x24" "x25" ## row3 "x31" "x32" "x33" "x34" "x35" ## row4 "x41" "x42" "x43" "x44" "x45" ``` How to understand this: start counting in the upper left corner, move down the column, continue from the top of column 2 and you'll land on the element "x32" when you get to 7. --- ## Matrices are vectors! Note also that one can put an indexed matrix on the receiving end of an assignment operation and, as long as your replacement values have valid shape or extent, you can change the matrix. ```r j_mat["row1", 2:3] = c("HEY!", "THIS IS NUTS!") j_mat ``` ``` ## col1 col2 col3 col4 col5 ## row1 "x11" "HEY!" "THIS IS NUTS!" "x14" "x15" ## row2 "x21" "x22" "x23" "x24" "x25" ## row3 "x31" "x32" "x33" "x34" "x35" ## row4 "x41" "x42" "x43" "x44" "x45" ``` --- class: code90 ## Recycling also works! ```r norm_mat = matrix(rnorm(6), nrow = 3) cbind(norm_mat, rep(1,3), rowMeans(norm_mat)) ``` ``` ## [,1] [,2] [,3] [,4] ## [1,] 0.61982575 -1.4707524 1 -0.4254633 ## [2,] -0.05612874 -0.4781501 1 -0.2671394 ## [3,] -0.15579551 0.4179416 1 0.1310730 ``` Recycle over each entry in first column, each entry in second column ```r (center_mat = norm_mat - rowMeans(norm_mat)) ``` ``` ## [,1] [,2] ## [1,] 1.0452891 -1.0452891 ## [2,] 0.2110107 -0.2110107 ## [3,] -0.2868685 0.2868685 ``` --- ## Creating arrays, e.g. matrices All matrix elements must be the same flavor. If that's not true, you risk an error or, worse, silent conversion to character. To make a matrix: * Fill with a vector * Glue vectors together as rows or columns * Or conversion from a data.frame --- ## Fill with a vector ```r matrix(1:15, nrow = 5) ``` ``` ## [,1] [,2] [,3] ## [1,] 1 6 11 ## [2,] 2 7 12 ## [3,] 3 8 13 ## [4,] 4 9 14 ## [5,] 5 10 15 ``` ```r matrix(1:15, nrow = 5, byrow = TRUE) ``` ``` ## [,1] [,2] [,3] ## [1,] 1 2 3 ## [2,] 4 5 6 ## [3,] 7 8 9 ## [4,] 10 11 12 ## [5,] 13 14 15 ``` * `reshape2::acast()`! --- ## Recycle a vector ```r matrix(c("yo!", "foo?"), nrow = 3, ncol = 4) ``` ``` ## [,1] [,2] [,3] [,4] ## [1,] "yo!" "foo?" "yo!" "foo?" ## [2,] "foo?" "yo!" "foo?" "yo!" ## [3,] "yo!" "foo?" "yo!" "foo?" ``` --- ## Provide names ```r matrix(1:15, nrow = 5, dimnames = list(paste0("row", 1:5), paste0("col", 1:3))) ``` ``` ## col1 col2 col3 ## row1 1 6 11 ## row2 2 7 12 ## row3 3 8 13 ## row4 4 9 14 ## row5 5 10 15 ``` --- ## Bind columns Here we create a matrix by binding vectors together. Watch the vector names propagate as row or column names. ```r vec1 = 5:1 vec2 = 2^(1:5) cbind(vec1, vec2) ``` ``` ## vec1 vec2 ## [1,] 5 2 ## [2,] 4 4 ## [3,] 3 8 ## [4,] 2 16 ## [5,] 1 32 ``` --- ## bind rows ```r rbind(vec1, vec2) ``` ``` ## [,1] [,2] [,3] [,4] [,5] ## vec1 5 4 3 2 1 ## vec2 2 4 8 16 32 ``` You may have also seen me use `bind_rows()` and `bind_cols()` -- these are analogous tidyverse functions that you will want to use when working with **data frames**--they don't work with matrices. They have nicer defaults for data frames than `cbind()` and `rbind()`. --- ## From a data frame. ```r (vecDat = tibble(vec1 = 5:1, vec2 = 2^(1:5))) ``` ``` ## # A tibble: 5 × 2 ## vec1 vec2 ## <int> <dbl> ## 1 5 2 ## 2 4 4 ## 3 3 8 ## 4 2 16 ## 5 1 32 ``` ```r vecMat = as.matrix(vecDat) str(vecMat) ``` ``` ## num [1:5, 1:2] 5 4 3 2 1 2 4 8 16 32 ## - attr(*, "dimnames")=List of 2 ## ..$ : NULL ## ..$ : chr [1:2] "vec1" "vec2" ``` --- ## From a data frame with silent coercion 🤦 ```r multiDat = tibble(vec1 = 5:1, vec2 = paste0("hi", 1:5)) (multiMat = as.matrix(multiDat)) ``` ``` ## vec1 vec2 ## [1,] "5" "hi1" ## [2,] "4" "hi2" ## [3,] "3" "hi3" ## [4,] "2" "hi4" ## [5,] "1" "hi5" ``` ```r # Hey! Where did that heading come from? emo::ji("person_facepalming") ``` ``` ## 🤦 ``` --- class: code70 ## Matrix multiplication Matrices have its own special multiplication operator, written `%*%`: ```r (six_sevens = matrix(rep(7,6), ncol=3)) ``` ``` ## [,1] [,2] [,3] ## [1,] 7 7 7 ## [2,] 7 7 7 ``` ```r (z_mat = matrix(c(40,1,60,3), nrow=2)) ``` ``` ## [,1] [,2] ## [1,] 40 60 ## [2,] 1 3 ``` ```r z_mat %*% six_sevens # [2x2] * [2x3] ``` ``` ## [,1] [,2] [,3] ## [1,] 700 700 700 ## [2,] 28 28 28 ``` --- ## Rowwise/columnwise manipulations * `rowSums()` `rowMeans()` * `colSums()` `colMeans()` * many more in `matrixStats` * roll your own with `apply(<MATRIX>, <1|2>, <FUN>)` * Use `1` for rows, `2` for columns --- ## rowSums vs apply ```r rowSums(z_mat) ``` ``` ## [1] 100 4 ``` ```r apply(z_mat, 1, sum) ``` ``` ## [1] 100 4 ``` --- ## Matrix diagonal The `diag()` function can be used to extract the diagonal entries of a matrix: ```r diag(z_mat) ``` ``` ## [1] 40 3 ``` It can also replace the diagonal: ```r diag(z_mat) = c(35,4) z_mat ``` ``` ## [,1] [,2] ## [1,] 35 60 ## [2,] 1 4 ``` --- ## Creating a diagonal matrix Finally, `diag()` can be used to create a diagonal matrix: ```r diag(c(3,4)) ``` ``` ## [,1] [,2] ## [1,] 3 0 ## [2,] 0 4 ``` ```r diag(2) ``` ``` ## [,1] [,2] ## [1,] 1 0 ## [2,] 0 1 ``` --- ## Other matrix operators **Transpose**: ```r t(z_mat) ``` ``` ## [,1] [,2] ## [1,] 35 1 ## [2,] 60 4 ``` **Determinant**: ```r det(z_mat) ``` ``` ## [1] 80 ``` --- ## Other matrix operators **Inverse**: ```r solve(z_mat) ``` ``` ## [,1] [,2] ## [1,] 0.0500 -0.7500 ## [2,] -0.0125 0.4375 ``` ```r z_mat %*% solve(z_mat) ``` ``` ## [,1] [,2] ## [1,] 1 0 ## [2,] 0 1 ``` --- ## Putting it all together...implications for data.frames Hopefully the slog through vectors, matrices, and lists will be redeemed by greater prowess at data analysis. Consider: * a data.frame is a *list* * the list elements are the variables and they are *atomic vectors* * data.frames are rectangular, like their matrix friends, so your intuition -- and even some syntax -- can be borrowed from the matrix world .alert[A data.frame is a list that quacks like a matrix.] --- ## Reviewing list-style indexing of a data.frame ```r j_dat ``` ``` ## w x y z ## 1 -0.62124058 1 A TRUE ## 2 -2.21469989 2 B TRUE ## 3 1.12493092 3 C TRUE ## 4 -0.04493361 4 D TRUE ## 5 -0.01619026 5 E TRUE ... ``` ```r j_dat$z ``` ``` ## [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE ``` ```r i_want_this = "z" (j_dat[[i_want_this]]) # atomic ``` ``` ## [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE ``` --- ## Reviewing vector-style indexing of a data.frame: ```r j_dat["y"] ``` ``` ## y ## 1 A ## 2 B ## 3 C ## 4 D ## 5 E ... ``` ```r i_want_this = c("w", "z") j_dat[i_want_this] # index with a vector of variable names ``` ``` ## w z ## 1 -0.62124058 TRUE ## 2 -2.21469989 TRUE ## 3 1.12493092 TRUE ## 4 -0.04493361 TRUE ## 5 -0.01619026 TRUE ... ``` --- ## Demonstrating matrix-style indexing of a data.frame: ```r j_dat[ , "z", drop = FALSE] ``` ``` ## z ## 1 TRUE ## 2 TRUE ## 3 TRUE ## 4 TRUE ## 5 TRUE ... ``` ```r j_dat[ , "z", drop = TRUE] ``` ``` ## [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE ``` --- ## Demonstrating matrix-style indexing of a data.frame: ```r j_dat[c(2, 4, 7), c(1, 4)] # awful and arbitrary but syntax works ``` ``` ## w z ## 2 -2.21469989 TRUE ## 4 -0.04493361 TRUE ## 7 0.82122120 FALSE ``` ```r j_dat[j_dat$z, ] ``` ``` ## w x y z ## 1 -0.62124058 1 A TRUE ## 2 -2.21469989 2 B TRUE ## 3 1.12493092 3 C TRUE ## 4 -0.04493361 4 D TRUE ## 5 -0.01619026 5 E TRUE ... ``` <!-- --- --> <!-- ## Post-test --> <!-- Ok, now let's make another attempt on [the quiz](https://docs.google.com/forms/d/e/1FAIpQLSexoCRQ0WqMH_yh38_cvpj28mM7Au8OJX8psuMJuTx9QNUgdw/viewform?usp=sf_link). --> --- ## Recap - Elemental data types + `logical`, `integer`, `numeric`, `complex`, `character` - Compound data types + `class`, `attributes` - Data structures + `vector`, `list`, `data.frame`, `matrix`, `array` - Be careful about data types / classes + Sometimes `R` makes silly assumptions about your data class + If a plot/output is not behaving the way you expect, first investigate the data class with `str` --- ## Acknowledgments Based off of [materials from](https://www.stat.cmu.edu/~ryantibs/statcomp/lectures/intro.html) Ryan Tibshirani "Statistical Computing" at CMU and [Stat 545](https://stat545.com/r-objects.html) at UBC. More reading: [r4ds chapter 20](https://r4ds.had.co.nz/vectors.html). <!-- --- --> <!-- # Appendix --> <!-- --- --> <!-- class: code70 --> <!-- ```{r, ref.label='tabulate-delay', echo = TRUE, eval = FALSE} --> <!-- ``` -->