names and performance in R

Names decorating data structures are convenient. They allow us to write more intelligible code, ie people["Joe", "height"]. But they can have hidden performance costs.

library(microbenchmark)

n <- 1e6L
l <- list(a = 1:n, b = 1:n)

# Super slow!
microbenchmark(out1 <- do.call(c, l), times = 10L)

It takes around 600 ms on my computer to combine two vectors of length 1 million. In contrast, if I write it in the following way it takes about 4 ms:

# Super fast!
microbenchmark(out2 <- c(l[[1]], l[[2]]), times = 10L)

head(out1)
# a1 a2 a3 a4 a5 a6
#  1  2  3  4  5  6

head(out2)
# [1] 1 2 3 4 5 6

Both produce a vector of length 2 million by concatenating the two vectors in a list. The only difference is that do.call() grabbed the names from the containing list. This is convenient in most cases, but here performance suffers by a factor of 150 times. Ouch.

microbenchmark(out3 <- do.call(c, unname(l)), times = 10L)

This version is again fast, around 4 ms. I’m surprised names can have such a large performance impact.

UPDATE: Henrik Bengtsson suggested the more idiomatic unlist(x, use.names = FALSE). This is fast and easier to read. Thanks Henrik!