Are apply style function in R, say `sapply()` any faster than `for` loops in R? We can gain some insight using the `microbenchmark` package.

If this question worries you, then just relax. If you have code that you would like to make faster, then the only thing to do is profile it. Hacking code up for performance before you need it is premature optimization, the root of all evil ;)

## Rules of thumb

Write vectorized R code when possible, ie. `y <- f(x)`. This single line of code totally expresses the desired computation.

If the function doesn’t operate on vectors or the data is a list then use a higher order function such as `sapply`, `lapply`, etc. These typically capture the intent of the code more succinctly than a for loop. I like them because they’re easier to parallelize.

Below is a little experiment:

``````
library(microbenchmark)

n = 10000L
x = seq(from = 0, to = pi, length.out = n)

f = function(x)
{
out = log(0.43 * x + 1)
out = exp(x)
x
}

baseline = median(microbenchmark(

y <- f(x)

)\$time)

apply_time = median(microbenchmark(

y2 <- sapply(x, f)

)\$time)

for_time = median(microbenchmark({

y3 <- vector(mode = "list", length = n)
for(i in 1:n){
y3[[i]] = f(x[i])
}
y3 <- as.numeric(y3)

})\$time)

apply_time / baseline

for_time / baseline

for_time / apply_time

``````

Note the complexity of the implementations as measured by lines of code. The vectorized and `sapply()` versions take only 1 line, while the `for()` loop uses 4 or 5 lines.

## Results

The `sapply()` was faster than the `for()` loop, but how much faster depends on the values of `n`. For `n = 100` the `sapply()` is 15 times slower than the vectorized version, and the `for()` is 23 times slower than the `sapply()`!

For `n = 10000` the `sapply()` is 19 times slower than the vectorized version, and the `for()` is 1.16 times slower than the `sapply()`.

As the number of elements grows much past 1000 we see less speed benefits from choosing `sapply()` over `for()`. Choose `sapply()` for other reasons.

Main Point: Both `sapply()` and `for()` loops are much slower than vectorized code.