# Fast Import of Pillow Images to NumPy / OpenCV Arrays

There’s an incredible technique that makes it possible to convert Pillow images to NumPy arrays with just two memory copies! Wait, what do you mean “with only two memory copies”? Isn’t it possible to convert data between libraries while copying memory only once or without copying it at all?

Well, it seems weird, but more traditional image converting methods work 1.5–2.5 times slower (if you need a mutable object). Today, I’m going to dive deep into both libraries and tell you why that happens. Also, I’ll show you a way to get the same result but faster. There won’t be any repositories or packages, just the facts, and working code at the end.

# First Things First: Key Concepts

NumPy is a Python library used for working with multidimensional arrays. It’s a base library for a bunch of scientific, computer vision, and machine learning libraries like SciPy, Pandas, Astropy and many others.

OpenCV is the most popular computer vision library and has a wide range of features. It doesn’t have its own internal storage format for images, instead, it uses NumPy arrays. The common scenario for using this library is when you need to convert an image from Pillow to NumPy so that you can work with it using OpenCV.

Today I’m going to run benchmarks on a Raspberry Pi 4 1800 MHz under a 64-bit OS. After all, where else would you need computer vision, if not on Raspberry? 🙂

# How NumPy Conversion Works

• numpy.array(im) — makes a copy from an image to a NumPy array.
• numpy.asarray(im) — the same as numpy.array(im, copy=False). Supposedly, it doesn’t make a copy but uses the memory of the original object instead. But it’s a bit more complicated than that.

One would think that in the second case, the NumPy array becomes kind of a representation of the original image, and if you change the NumPy array, the image will also change. In fact, that’s not the case:

`In : from PIL import Image In : import numpy In : im = Image.open('./canyon.jpg').resize((4096, 4096)) In : n = numpy.asarray(im)  In : n[:, :, 0] = 255 ValueError: assignment destination is read-only  In : n.flags Out:       C_CONTIGUOUS : True      F_CONTIGUOUS : False      OWNDATA : False      WRITEABLE : False      ALIGNED : True      WRITEBACKIFCOPY : False      UPDATEIFCOPY : False`

It’s way different from what you get if you use the numpy.array() function:

`In : n = numpy.array(im)  In : n[:, :, 0] = 255  In : n.flags Out:       C_CONTIGUOUS : True   F_CONTIGUOUS : False   OWNDATA : True   WRITEABLE : True   ALIGNED : True   WRITEBACKIFCOPY : False   UPDATEIFCOPY : False`

However, the asarray() function works much faster:

`In : %timeit -n 10 n = numpy.array(im) 257 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  In : %timeit -n 10 n = numpy.asarray(im) 179 ms ± 786 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)`

There are fewer memory copies, but there’s a price: you can’t change the array. Still, the conversion time remains horribly long compared to the one-copy method. Let’s figure out why that happens.

# NumPy Arrays Interface

`@property     def __array_interface__(self):         shape, typestr = _conv_type_shape(self)         return {             "shape": shape,             "typestr": typestr,             "version": 3,             "data": self.tobytes(),         }`

_conv_type_shape() describes the type and size of the array that should be obtained. But the most interesting things happen in the tobytes() method. If you check how long this method is executed, it becomes clear that, in general, NumPy does not add anything from itself:

`In : %timeit -n 10 n = im.tobytes() 179 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)`

The time is exactly the same as the time of the asarray() function. Well, it seems like we’ve just found the culprit. The only thing left is to change the function call or speed it up, right? But it’s not that simple.

# Memory Organization in Pillow and NumPy

In Pillow, it’s a whole other thing. The image is stored in a number of chunks, and each chunk contains several image lines. Each pixel takes up 1 or 4 bytes (not from 1 to 4, exactly 1 or 4). This means that some bytes are not used for some color modes. For example, for RGB, the last byte in each pixel isn’t used. For black and white images with an alpha channel (LA mode), the middle two bytes are not used so that the alpha channel is in the last byte of the pixel.

I’m telling you this because I don’t want you to have the illusion that it’s possible to completely solve this issue without rewriting one of the libraries.

Now you understand why we need the tobytes() method here: it transforms the internal representation of a Pillow image into a continuous flow of bytes without omissions, and that’s exactly what NumPy can use. After getting the bytes object, NumPy can either make a copy or use it in read-only mode. I’m not sure if that’s done to make it impossible to get around object immutability in Python, or if there are some real limitations on the C API level. Either way, if, for example, the input object is bytearray instead of bytes, the array will not be read-only.

But let’s take a look at a simplified version of tobytes():

`def tobytes(self):         self.load()         # unpack data         e = Image._getencoder(self.mode, "raw", self.mode)         e.setimage(self.im)         data, bufsize, s = [], 65536, 0         while not s:             l, s, d = e.encode(bufsize)             data.append(d)         if s < 0:             raise RuntimeError(f"encoder error {s} in tobytes")         return b"".join(data)`

Here we can see that the “raw” encoder has been created, and it produces image chunks of at least 65 kilobytes of memory. This is the first memory copy: by the end of the functions, we have the entire image in small chunks in the data array. The last line is the second memory copy: all the chunks are collected into one large byte string.

# So, Who’s Guilty, and How to Deal With It?

I want to highlight that we can’t give up the encoder: who knows what implementation details it’s hiding from us? Transferring it all to the Python level or rewriting part of it in C is a last resort.

It seems like it would be more reasonable to allocate a buffer of the required size in tobytes() in advance, and then write chunks to it. But it’s obvious that the encoder interface doesn’t work that way: it already returns chunks packed into bytes objects. However, if you don’t store all those chunks, but instead instantly copy it to a buffer, the data won’t be washed out from the L2 cache and will quickly get to the right place. It will look something like this:

`def to_mem(im):     im.load()     e = Image._getencoder(im.mode, "raw", im.mode)     e.setimage(im.im)      mem = ... # we don't know yet      bufsize, offset, s = 65536, 0, 0     while not s:         l, s, d = e.encode(bufsize)         mem[offset:offset + len(d)] = d         offset += len(d)     if s < 0:         raise RuntimeError(f"encoder error {s} in tobytes")     return mem`

What will we have instead of mem? Ideally, it should be a NumPy array. It’s not a problem to create it; we’ve already seen what parameters it will have in __array_interface__:

`In : shape, typestr = Image._conv_type_shape(im) In : data = numpy.empty(shape, dtype=numpy.dtype(typestr))`

But it won’t work if instead of mem you try to take its flat version:

`In : mem = data.reshape((data.size,)) In : mem[0:4] = b'abcd' ValueError: invalid literal for int() with base 10: b'abcd'`

In this case, it seems strange that it’s impossible to put bytes into a byte array. But keep in mind that, firstly, not only bytes can be on the left, and secondly, the library is called NumPy, meaning it works with numbers. Fortunately, NumPy gives you access to the immediate memory of an array directly from Python. Here’s its data property:

`In : data.dataOut: <memory at 0x7f78854d68>In : data.data = 255 NotImplementedError: sub-views are not implemented In : data.data.shape Out: (4096, 4096, 3)In : data.data[0, 0, 0] = 255`

There is a memoryview object. But this memoryview is strange: it’s also multidimensional, like the NumPy array itself, and it has the same object type as the array itself. Fortunately, it’s easy to fix with the cast method:

`In : mem = data.data.cast('B', (data.data.nbytes,))  In : mem.nbytes == mem.shape Out: True  In : mem, mem Out: (255, 0)  In : mem[0:4] = b'1234'  In : mem, mem   Out: (49, 50)`

Now let’s put it all together:

`def to_numpy(im):     im.load()     # unpack data     e = Image._getencoder(im.mode, 'raw', im.mode)     e.setimage(im.im)          # NumPy buffer for the result     shape, typestr = Image._conv_type_shape(im)     data = numpy.empty(shape, dtype=numpy.dtype(typestr))     mem = data.data.cast('B', (data.data.nbytes,))          bufsize, s, offset = 65536, 0, 0     while not s:         l, s, d = e.encode(bufsize)         mem[offset:offset + len(d)] = d         offset += len(d)     if s < 0:         raise RuntimeError("encoder error %d in tobytes" % s)     return data`

And check:

`In : n = to_numpy(im) In : numpy.all(n == numpy.array(im))Out: True In : n.flagsOut:   C_CONTIGUOUS : True   F_CONTIGUOUS : False   OWNDATA : True   WRITEABLE : True   ALIGNED : True   WRITEBACKIFCOPY : False   UPDATEIFCOPY : FalseIn : %timeit -n 10 n = to_numpy(im) 101 ms ± 260 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)`

Great! It’s 2.5 times faster with the same functionality and a smaller number of allocations.

# Benchmarks

Here’s the code:

`In : for i in range(6, 0, -1):     ...:     i = 128 * 2 ** i     ...:     print(f'\n\nSize: {i}x{i}   \t{i*i // 1024} KPx')     ...:     im = Image.new('RGB', (i, i))     ...:     print('\tnumpy.array()')     ...:     %timeit n = numpy.array(im)     ...:     print('\tnumpy.asarray()')     ...:     %timeit n = numpy.asarray(im)     ...:     print('\tto_numpy()')     ...:     %timeit n = to_numpy(im)     ...:     im = None     ...:`

Results:

As a result, we’ve managed to eliminate unnecessary memory allocation, speed up the process from 1.5 to 2.5 times, and figure out how NumPy works with memory along the way.

File system as a service for web and mobile apps