BST430 Lecture 02-C

Memory and Environments in R

Seong-Hwan Jun

U of Rochester

2025-08-29

Namespace

  • Sometimes there are name clashes. For example, filter() exists in both dplyr and stats.
  • In such cases, we can specify the package name using :: operator.

Namespace

stats::filter
function (x, filter, method = c("convolution", "recursive"), 
    sides = 2L, circular = FALSE, init = NULL) 
{
    method <- match.arg(method)
    x <- as.ts(x)
    storage.mode(x) <- "double"
    xtsp <- tsp(x)
    n <- as.integer(NROW(x))
    if (is.na(n)) 
        stop(gettextf("invalid value of %s", "NROW(x)"), domain = NA)
    nser <- NCOL(x)
    filter <- as.double(filter)
    nfilt <- as.integer(length(filter))
    if (is.na(nfilt)) 
        stop(gettextf("invalid value of %s", "length(filter)"), 
            domain = NA)
    if (anyNA(filter)) 
        stop("missing values in 'filter'")
    if (method == "convolution") {
        if (nfilt > n) 
            stop("'filter' is longer than time series")
        sides <- as.integer(sides)
        if (is.na(sides) || (sides != 1L && sides != 2L)) 
            stop("argument 'sides' must be 1 or 2")
        circular <- as.logical(circular)
        if (is.na(circular)) 
            stop("'circular' must be logical and not NA")
        if (is.matrix(x)) {
            y <- matrix(NA, n, nser)
            for (i in seq_len(nser)) y[, i] <- .Call(C_cfilter, 
                x[, i], filter, sides, circular)
        }
        else y <- .Call(C_cfilter, x, filter, sides, circular)
    }
    else {
        if (missing(init)) {
            init <- matrix(0, nfilt, nser)
        }
        else {
            ni <- NROW(init)
            if (ni != nfilt) 
                stop("length of 'init' must equal length of 'filter'")
            if (NCOL(init) != 1L && NCOL(init) != nser) {
                stop(sprintf(ngettext(nser, "'init' must have %d column", 
                  "'init' must have 1 or %d columns", domain = "R-stats"), 
                  nser), domain = NA)
            }
            if (!is.matrix(init)) 
                dim(init) <- c(nfilt, nser)
        }
        ind <- seq_len(nfilt)
        if (is.matrix(x)) {
            y <- matrix(NA, n, nser)
            for (i in seq_len(nser)) y[, i] <- .Call(C_rfilter, 
                x[, i], filter, c(rev(init[, i]), double(n)))[-ind]
        }
        else y <- .Call(C_rfilter, x, filter, c(rev(init[, 1L]), 
            double(n)))[-ind]
    }
    tsp(y) <- xtsp
    class(y) <- if (nser > 1L) 
        c("mts", "ts")
    else "ts"
    y
}
<bytecode: 0x7f81037ae2f8>
<environment: namespace:stats>

Namespace

dplyr::filter
function (.data, ..., .by = NULL, .preserve = FALSE) 
{
    check_by_typo(...)
    by <- enquo(.by)
    if (!quo_is_null(by) && !is_false(.preserve)) {
        abort("Can't supply both `.by` and `.preserve`.")
    }
    UseMethod("filter")
}
<bytecode: 0x7f81022eb5f0>
<environment: namespace:dplyr>

Global environment

  • When you start R, the global environment is created automatically.
  • You can see the objects in your global environment using ls().
  • You can remove objects from your global environment using rm().
  • The global environment is the top-level environment in R, and it has no parent environment.

Local environment

  • Most typically created when a function is called.
  • Variables declared within a local environment is only accessible within that environment.
  • When a function is called, R creates a new local environment for that function call.
  • When the function exits, the local environment is destroyed.
  • Local environments can access variables in the global environment (parent environment), but not vice versa.

Names and values

x <- c(1, 2, 3)

This line allocates a space in memory for the vector c(1, 2, 3) and assigns the name x to that space.

x is a reference to an actual object, in this case, vector c(1, 2, 3).

Names and values

y <- x

This will create another name, y, that also binds to the existing vector c(1, 2, 3).

Names and values

Print their memory addresses:

print(lobstr::obj_addr(x))
[1] "0x7f81065cbb78"
print(lobstr::obj_addr(y))
[1] "0x7f81065cbb78"

They point to the same object.

Names and values

By the way, functions also have names and values.

print(lobstr::obj_addr(mean))
[1] "0x7f8101e592f8"

Functions are also objects in R.

  • mean is a name that points to a function object.
  • The name mean is just a reference to that object. The actual function object or definition is stored in memory.

Copy-on-modify

R employs a copy-on-modify strategy to manage memory efficiently.

When you modify an object, R creates a copy of the object only if it is necessary (i.e., if there are multiple references to the same object).

Copy-on-modify

We will use tracemem function, which tracks when an object is copied in memory.

xxx <- c(1, 2, 3)
tracemem(xxx)
[1] "<0x7f810aea27c8>"
yyy <- xxx # Copy does not occur here.
print(lobstr::obj_addr(xxx))
[1] "0x7f810aea27c8"
print(lobstr::obj_addr(yyy))
[1] "0x7f810aea27c8"

Copy-on-modify

yyy[1] <- 100 # copy occurs here!
tracemem[0x7f810aea27c8 -> 0x7f81043298f8]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group.block process_group withCallingHandlers withCallingHandlers handle_error process_file <Anonymous> <Anonymous> execute .main 
print(xxx)
[1] 1 2 3
print(yyy)
[1] 100   2   3
print(lobstr::obj_addr(xxx))
[1] "0x7f810aea27c8"
print(lobstr::obj_addr(yyy))
[1] "0x7f81043298f8"

What happens if you modify xxx?

xxx[1] <- -100
tracemem[0x7f810aea27c8 -> 0x7f810b009c48]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group.block process_group withCallingHandlers withCallingHandlers handle_error process_file <Anonymous> <Anonymous> execute .main 
print(xxx)
[1] -100    2    3
print(lobstr::obj_addr(xxx))
[1] "0x7f810b009c48"
  • What happened to the original object c(1, 2, 3)?

  • It is still in memory, but now there are no names pointing to it, so it will be garbage collected if you run out of space.

Copy on function calls?

f <- function(a) {
  lobstr::obj_addr(a)
  return(a)
}
xyz <- c(1, 10, 100)
lobstr::obj_addr(xyz)
[1] "0x7f8104322118"
ret_obj <- f(xyz)
lobstr::obj_addr(ret_obj)
[1] "0x7f8104322118"

No copy takes place because xyz is not modified inside function f.

Where does copy take place?

f <- function(a) {
  lobstr::obj_addr(a)
  a[1] <- -1
  return(a)
}
xyz <- c(1, 10, 100)
lobstr::obj_addr(xyz)
[1] "0x7f8104324d48"
ret_obj <- f(xyz)
lobstr::obj_addr(ret_obj)
[1] "0x7f8104323858"

Modify-in-place

If only a single reference exists for an object, then modifying is to occur in place (no copy is made) – but this is nearly impossible to check.

  • When we call tracemem(x), R seems to create a new reference to the object x, so now there are two references.
  • Even when references are removed, the way R keeps track of references is complex, so it is hard to be sure how many references there are.

An exception: environments. These are a special type of object and always modified in place.

Performance implications

Why is frequent copying harmful for performance?

  • Copying large objects takes time and memory.
  • Frequent copying can lead to increased memory usage and slower performance.

Shallow vs Deep copy

  • Shallow copy: creates a new object that references the same memory as the original object.
  • Deep copy: creates a new object with its own copy of the data in memory.

Lists: list

l1 <- list(1, 2, 3)
  • Each element of the list is a reference to an object.
  • e.g., l1[[1]] is a reference to object 1 stored in memory and l1[[2]] is a reference to object 2 and so on.

Lists: list

l2 <- l1 # Shallow copy here.
lobstr::ref(l1)
█ [1:0x7f81042f0638] <list> 
├─[2:0x7f810b729e88] <dbl> 
├─[3:0x7f810b729da8] <dbl> 
└─[4:0x7f810b729cc8] <dbl> 
l2[[3]] <- 4
lobstr::ref(l2)
█ [1:0x7f810a0a38f8] <list> 
├─[2:0x7f810b729e88] <dbl> 
├─[3:0x7f810b729da8] <dbl> 
└─[4:0x7f8108c47128] <dbl> 
  • l2[[1]] and l2[[2]] still point to the same object as l1[[1]] and l1[[2]].
  • l2[[3]] now points to a new object 4 in memory.

Data frames: data.frame

  • Data frame is a list of vectors – each column points to a vector stored in memory.
  • What happens if you modify a column? a row? -> Find out in the lab.

Character vectors

x <- c("hi", "hello", "a", "b", "hi")

The first and the last element should point to distinct objects both containing the string “hi”… right?

Character vectors

x <- c("hi", "hello", "a", "b", "hi")
lobstr::ref(x, character = TRUE)
█ [1:0x7f810a0fcaa8] <chr> 
├─[2:0x7f810a0a1160] <string: "hi"> 
├─[3:0x7f810a0a10b8] <string: "hello"> 
├─[4:0x7f8101872ce0] <string: "a"> 
├─[5:0x7f81018d3ba8] <string: "b"> 
└─[2:0x7f810a0a1160] 

R uses a global pool of strings to avoid storing multiple copies of the same string in memory.