Intro to Tidyverse
U of Rochester
2025-09-01
Statistical and data analysis commonly involves the following steps:
readr
).tidyr
).dplyr
).purrr
/stringr
/forcats
/lubridate
).ggplot2
).The main data structure for tidyverse is a “tibble”, an extension of data.frame
that complains a lot when you are not explicit.
tibble
The idea is to make it harder to make a mistake in your code by requiring you to be explicit. This helps to identify logical errors early.
Any operation that can be performed on data.frame
can be performed on tibble
because it extends data.frame
.
tibble
data.frame
does.data.frame
does unless you set stringsAsFactors=FALSE
.readr
readr
package provides functions to read rectangular data.tibble
instead of data.frame
.readxl
for importing Excel files.tidyr
tidyr
package provides functions to “tidy” your data. This is typically one of the very first steps in data analysis.
Definition of tidy data:
Two forms: long vs wide.
tidyr
provides functionalities to reshape data into desired forms – essential for visualization.
janitor
Not officially part of tidyverse
, but janitor
package provides useful functions for examining and cleaning data.
For example, making clean column names.
ggplot2
ggplot2
package provides a powerful and flexible way to create data visualizations.ggplot2
allows creation of publication quality graphics with a consistent and coherent system.There are many extensions that build on ggplot2
. See gallary.
dplyr
dplyr
package provides a grammar of data manipulation.
dplyr
provides a set of verbs that correspond to common data manipulation tasks.stringr
stringr
package provides a cohesive set of functions designed to make working with strings as easy as possible.
Example use cases:
forcats
forcats
package provides tools for working with categorical variables.lubridate
lubridate
package makes it easier to work with date-times and time-spans by providing convenient functions for converting across timezones and extracting information such as duration or intervals between two date/time points.purrr
purrr
package makes it easier to work with functions and vectors providing features such as map-reduce, effectively replacing for-loops.