vectors to matrices and back
I’ve been playing around with data sets that won’t fit in memory. One experiment involves generating random data. Since it’s so large it pays to have everything as efficient as possible.
Here’s a trick to avoid copying
data that relies on the behavior of primitive replacement functions
modifying in place. In particular we’re looking at the behavior of dim
:
> `dim<-`
function (x, value) .Primitive("dim<-")
The tricky way to create a matrix from a vector is to just set the
dimensions through dim<-
, that is dim()
on the left hand side of an
assignment operator. This is the same thing as setting the
shape
on a Numpy array in Python. A matrix (or higher dimensional array) is the
same as a vector in memory. They’re both contiguous blocks of homogeneously
types data. The dimensions are metadata attached to this object.
> a = 1:10
> library(pryr)
> address(a)
[1] "0x2d37318"
> a
[1] 1 2 3 4 5 6 7 8 9 10
> class(a)
[1] "integer"
> dim(a) = c(1, 10)
> a
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 2 3 4 5 6 7 8 9 10
> class(a)
[1] "matrix"
> dim(a)
[1] 1 10
> address(a)
[1] "0x2d37318"
Since the memory address is the same we see this avoided a copy. The more
obvious way to instantiate a matrix is to call the matrix()
function. I
prefer this because it’s more expressive of what the programmer wishes to
happen: “make b
a matrix with 1 row”.
> b = 1:10
> address(b)
[1] "0x256d390"
> b = matrix(b, nrow = 1)
> b
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 2 3 4 5 6 7 8 9 10
> address(b)
[1] "0x2dde6e8"
These addresses are different, so R copied the data. We know this program didn’t need to copy data, but this copying is actually nice since this follows R’s functional programming model. Read more in Hadley Wickham’s Advanced R.