I was curious how Numpy stores booleans, so I decided to explore it a bit. Right at the top of the Numpy docs it says that the boolean type is stored as a byte. That’s 8 bits instead of 1, but it probably makes computation more efficient.
Here’s the experiment:
In : tf = np.array([True, False]) In : tf Out: array([ True, False], dtype=bool) In : a = np.random.choice(tf, int(1e7))
tf is a Numpy array containing
np.random.choice samples 10 million times in this case. The system monitor verified that this line of code resulted in a data structure occupying 10 MB in memory.
Consider this little line of code:
In : b = np.zeros(int(1e7)) In : b.dtype Out: dtype('float64')
One might expect it to create 10 million floating point numbers, resulting in an additional memory use of 8 bytes * 10 million ~ 80 MB of memory. This is what happens for
np.zeros uses almost no memory. This means that something very clever is happening, and it’s using a sparse data structure.
It’s the subtleties that make these things interesting.