random boolean in numpy
I was curious how Numpy stores booleans, so I decided to explore it a bit. Right at the top of the Numpy docs it says that the boolean type is stored as a byte. That’s 8 bits instead of 1, but it probably makes computation more efficient.
Here’s the experiment:
In [2]: tf = np.array([True, False])
In [3]: tf
Out[3]: array([ True, False], dtype=bool)
In [4]: a = np.random.choice(tf, int(1e7))
tf
is a Numpy array containing True
and False
. np.random.choice
samples 10 million times in this case. The system monitor verified that this line of code resulted in a data structure occupying 10 MB in memory.
Interesting Zeros
Consider this little line of code:
In [10]: b = np.zeros(int(1e7))
In [11]: b.dtype
Out[11]: dtype('float64')
One might expect it to create 10 million floating point numbers, resulting in an additional memory use of 8 bytes * 10 million ~ 80 MB of memory. This is what happens for np.ones
. But np.zeros
uses almost no memory. This means that something very clever is happening, and it’s using a sparse data structure.
It’s the subtleties that make these things interesting.