converting cov2cor from R to julia
Recently I rewrote the cov2cor()
function in R because I needed to use it
and I didn’t know that it already existed in the stats
package. I used a
for
loop because this makes it obvious that the formula is correct. Let
\(X\) be the covariance matrix. This formula transforms it to a correlation
matrix \(Y\):
My version with the for loop in R took about 5 seconds to run on a \(1400 \times 1400\) matrix. Below is the builtin R implementation for the vectorized version in R. I’ve omitted error checking and modified the variable names for consistency.
cov2cor = function (X)
{
n <- (d <- dim(X))[1L]
d <- sqrt(1/diag(X))
Y <- X
Y[] <- d * X * rep(d, each = n)
Y[cbind(1L:n, 1L:n)] <- 1
Y
}
The vectorized version takes about 50 ms, so it’s 100 times faster than my
naive for
loop. The line d * X * rep(d, each = n)
does the actual work
using R’s recycling
rules.
Sometimes we can use R’s column major storage to do even fancier
vectorization tricks, but it didn’t matter here because this problem is
symmetric. This code is reasonably fast, but it’s not immediately obvious
that the result is the same as the formula given above.
I was curious about how much faster this could be, so I translated my naive
R for
loop to Julia. At this point it became much less naive! Professor
Ethan Anderes, our local Julia expert, kindly provided a few
variations that made it much faster through cache coherency and efficient
array initialization through similar()
. Here’s one Julia version:
function cov2cor(X)
d = 1./sqrt.(diag(X))
Y = similar(X)
n = size(X,1)
@inbounds for col in 1:n
@simd for row in 1:col
Y[row,col] = X[row,col] * d[row] * d[col]
end
end
return Symmetric(Y)
end
This runs in about 2ms, making it 25x faster than R’s vectorized version
and 2500x faster than an R for
loop. I find Julia’s two for
loops:
for col in 1:n
for row in 1:col
Y[row,col] = X[row,col] * d[row] * d[col]
end
end
to be more clear than the equivalent line in R:
Y[] <- d * X * rep(d, each = n)
It’s more clear because I can recognize the correspondence to the formula
more quickly in the Julia version. I value clarity more than speed in most
cases. Here Julia provides both. One disadvantage is that I needed to know
Julia is column major in order to write the for
loops in a cache friendly
way.
Dreams
What I really want now is a way to compile Julia code into a shared object to load into an R session, just like we can do with C/C++ code. This is nice because Julia code is relatively easier to develop compared to C. Then we could call this Julia function directly from R. This differs from John Chambers’ approach in XRJulia package which uses proxy objects.
Thanks to Professor Anderes and the R code review group at the DSI for showing interest in this.