Punchline: Watch out for lazy evaluation and
.Primitive() functions when
profiling R code.
While working on parallel R code, I wrote a chunked version of the
The idea is to split the original problem into several chunks that can be
computed in parallel. This version is serial, but the idea is the same. The
actual work is done when the
cov() function included with the
package is called on each chunk.
For an matrix with and the
chunked version was about 3 times slower than just calling
itself. This surprised me, since I expected the overhead of breaking the
problem into chunks to be dominated by the actual computation inside calls
cov(), thus making the chunked version only marginally slower then
cov() for sufficiently large .
So I profiled it, and saw that nearly all of the time was spent
Rprof("cov_chunked.out") replicate(10, cov_chunked(x)) Rprof(NULL) summaryRprof("cov_chunked.out") $by.self self.time self.pct total.time total.pct "is.data.frame" 7.72 56.35 7.72 56.35 "cov" 5.98 43.65 13.70 100.00
is.data.frame be so slow? I looked at it quite
carefully, stepped through the debugger, traced it, counted all the calls,
but didn’t understand.
is.data.frame is a simple function that checks the
class, and takes less than 1 microsecond. Yet the profiler shows that each
is.data.frame() was taking hundreds of milliseconds, depending on
the problem size of . This dependence on the problem size turned
out to be a major clue.
The following line was inside an lapply function:
cov(x[, index[]], x[, index[]])
Looking at the
cov() function itself, the first uses of arguments
y are when it calls
is.data.frame() on them. In this case
x[, index[]], which was an unevaluated promise until that
point. The subset operator
[ was taking hundreds of milliseconds to
allocate memory and copy large chunks of
x. R’s copy on write model
doesn’t apply here, because we didn’t just pass in
The profiler should have showed me the time was actually spent within
But here’s what we see at the R level:
> `[` .Primitive("[")
This means that
[ does absolutely everything in C, bypassing R code
completely. And it never showed up on the profiler.
Big thanks to Duncan Temple Lang for looking through this with me and figuring it out~