Python Forum
Convert numpy array to image without loading it into RAM.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Convert numpy array to image without loading it into RAM.
#1
My program can generate huge numpy arrays. Personally, I think a sensible maximum size is an array of shape (819200, 460800, 4) - a 4K (4096x2304) image scaled up 200 times. Although it can create bigger, if you do it's just a bit stupid.
According to numpy, if you try and create an array that big:
datas  = np.zeros((819200, 460800, 4), np.unint8)
Error:
Unable to allocate 1.37 TiB for an array with shape (819200, 460800, 4) and data type uint8
Well... 1.37TB.... that's a lot. Of course, that's because it's trying to load it into RAM. After doing some research, I found H5PY which means I can store (and modify) my massive numpy array on disk, rather than RAM (that means it does take a performance hit however).
A numpy array that size, full of zeros only takes 4KB of disk space with H5PY (using GZIP compression)
with h5py.File("mytestfile.hdf5", "w") as f:
	dset = f.create_dataset("mydataset", data=(819200, 460800, 4), dtype='i', compression='gzip') #creates 4KB file

Let's say that array is now full of RGBA values.
To save it as an image I need to do: cv2.imwrite("someimage.png", np.array(dset)). The issue? Well I have to load the dataset into a numpy array to save it as an image which means loading the array into RAM meaning I get a memory error.
Unfortunately, I'm not able to do this: cv2.imwrite("someimage.png", dset) because cv2 isn't able to read from a H5PY dataset.

Has anyone got an idea of how I can save my numpy array to an image without loading it into RAM?
Reply
#2
Use someone else's RAM - I did a quick Google search for converting hdf5 file to an image and found a number of sites that handle that, including https://mygeodata.cloud/converter/hdf5-to-tiff

So, rather than doing the conversion yourself, take your resulting file and convert using alternate software
Reply
#3
Thanks for the reply. It turns out it was an easy fix. Rather than doing np.array(dset) you have to do f["mydataset"][()]. cv2 seems to accept that happily.
Reply
#4
Numpy allows to create arrays that are stored on disk. Take a look at memmap function.
data = np.memmap('output.dat', shape=(819200, 460800, 4), mode='w+', dtype='i')
Reply
#5
(Jun-04-2020, 03:57 AM)scidam Wrote: Numpy allows to create arrays that are stored on disk. Take a look at memmap function.
data = np.memmap('output.dat', shape=(819200, 460800, 4), mode='w+', dtype='i')
Thanks for the reply. It doesn't look like you can specify a compression algorithm for this, so it takes up a bit much space compared to hd5py.

Sadly, the issue is no longer with creating a numpy array that big, and still have it editable, but rather the saving of the image. cv2 seems to load the image back into ram completely voiding all the work I just did to store it on disk.

I am still looking for a C based python library, like cv2, that rather than creating a png file, allows me to append to a png. This would allow me to break the array down into chunks, write the chunks to the png file, and them free up memory for the next round. I have created a simple script like that (using a numpy based png writer, called 'numpngw'), but it's very slow. (I don't think it's ever going to be fast on disk though):

(it's very messy code and there's no comments since it was just a test - it uses a modified version of numpngw, removing '__all__' from the file to allow me to access all functions, so i can create my own 'append' function)
Reply
#6
Don't even know if anyone is going to ever need this answer but it's here just in case.
I tried out PIL again. I decided to use TIFF files (they are fast, and can be animated so I don't need another library for gifs). Unfortunately I had the same problem I've had with all the libraries where they don't support appending data to an image (ie. you cant open the image file in the mode 'a'). Although Image.save have the parameter 'append' for TIFF files, that's used ig you want to add more than one image to the TIFF file.
I stuck with TIFFs and looked for a good python TIFF library. tifffile seemed to do everything I needed. It even has a memmap implementation which I used.
My code is now:
f = h5py.File("test.hdf5", "w")
dset = f.create_dataset("test", (10000,10000,3), dtype=np.uint8, compression='gzip')

memmap_image = tifffile.memmap('temp.tif', shape=dset.shape, dtype='uint8')
memmap_image[:] = dset[:]
memmap_image.flush()
del memmap_image
gc.collect()
I'm sure there's error that's going to crop up somehow somewhere in my code but at least it's working for now.
Reply
#7
This is EXACTLY the problem i'm facing right now and this thread was an absolute saver. Tank you very much.

(Jun-06-2020, 01:34 PM)DreamingInsanity Wrote: Don't even know if anyone is going to ever need this answer but it's here just in case.
I tried out PIL again. I decided to use TIFF files (they are fast, and can be animated so I don't need another library for gifs). Unfortunately I had the same problem I've had with all the libraries where they don't support appending data to an image (ie. you cant open the image file in the mode 'a'). Although Image.save have the parameter 'append' for TIFF files, that's used ig you want to add more than one image to the TIFF file.
I stuck with TIFFs and looked for a good python TIFF library. tifffile seemed to do everything I needed. It even has a memmap implementation which I used.
My code is now:
f = h5py.File("test.hdf5", "w")
dset = f.create_dataset("test", (10000,10000,3), dtype=np.uint8, compression='gzip')

memmap_image = tifffile.memmap('temp.tif', shape=dset.shape, dtype='uint8')
memmap_image[:] = dset[:]
memmap_image.flush()
del memmap_image
gc.collect()
I'm sure there's error that's going to crop up somehow somewhere in my code but at least it's working for now.
Reply
#8
Some time ago, I used h5py to record pictures into a h5 file (very usefull and powerfull); in the following example:
  1. a basic image is created using matplotlib
  2. the image is saved into the h5 file as a picture (you can see it under hdfview for instance)
  3. the image is saved into the h5 file as an array
  4. the previous picture is recovered an saved locally into a gif image

(you can create a gif animated with ImageMagick for instance, but it's another topic)

Here after 2 screenshots of the picture into hdfview.

Maybe it add some features on your project.

Paul


import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.backends.backend_agg import FigureCanvas
import os, h5py
 
Path = str(os.getcwd())

### 0) to create a fig
plt.style.use('_mpl-gallery')

# make data
x = np.linspace(0, 10, 100)
y = 4 + 2 * np.sin(2 * x)

# plot
fig, ax = plt.subplots(figsize=(16,16))
ax.plot(x, y, linewidth=2.0)

ax.set(xlim=(0, 8), xticks=np.arange(1, 8),
       ylim=(0, 8), yticks=np.arange(1, 8))

plt.show()
 
 
## 1) Convertion  Matplotlib fig into aNumpyArray ---> mandatory to change the picture into a matrix      
canvas = FigureCanvas(fig)
canvas.draw()
FigArray = np.array(canvas.renderer.buffer_rgba())
 
## 2) write directly a picture into the h5 file --> Note dataset is MANDATORY
h5 = h5py.File(Path + '/test_picture.h5', 'w')
ImageDataset = h5.create_dataset(name = "Example_picture", data = FigArray, dtype = 'uint8', compression = 'gzip')
ImageDataset.attrs["CLASS"] = np.string_("IMAGE")
ImageDataset.attrs["IMAGE_VERSION"] = np.string_("1.2")
ImageDataset.attrs["IMAGE_SUBCLASS"] = np.string_("IMAGE_TRUECOLOR")
ImageDataset.attrs["INTERLACE_MODE"] = np.string_("INTERLACE_MODE")
ImageDataset.attrs["IMAGE_MINMAXRANGE"] = np.uint8(0.255)
h5.close()


## 3) Write the fig into a basic array (the h5 file)
h5 = h5py.File(Path + '/test_array.h5', 'w')
NpArrays = h5.create_group('Fig')
DatasetFigs = NpArrays.create_dataset(name = 'Example of array', data = FigArray, dtype='f', compression = 'gzip')
h5.flush()
h5.close()


## 4) array recovering => save into a gif file
from PIL import Image as im
with h5py.File(Path + '/test_picture.h5','r') as pict:
    data = pict.get('/Example_picture')
    Array = np.array(data)
    
    # array is converted int a picture
    Picture = im.fromarray(Array)
    Picture.save(Path + '/export.gif')

Attached Files

Thumbnail(s)
       
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  IPython errors for numpy array min/max methods muelaner 1 509 Nov-04-2023, 09:22 PM
Last Post: snippsat
  Convert np Array A to networkx G IanAnderson 2 629 Jul-05-2023, 11:42 AM
Last Post: IanAnderson
  Expand the range of a NumPy array? PythonNPC 0 707 Jan-31-2023, 02:41 AM
Last Post: PythonNPC
  Change a numpy array to a dataframe Led_Zeppelin 3 1,066 Jan-26-2023, 09:01 PM
Last Post: deanhystad
  from numpy array to csv - rounding SchroedingersLion 6 2,065 Nov-14-2022, 09:09 PM
Last Post: deanhystad
  numpy.array has no attribute head Led_Zeppelin 1 1,193 Jul-13-2022, 12:56 AM
Last Post: Led_Zeppelin
  Seeing al the data in a dataframe or numpy.array Led_Zeppelin 1 1,111 Jul-11-2022, 08:54 PM
Last Post: Larz60+
  go over and search in numpy array faster caro 7 1,693 Jun-20-2022, 04:54 PM
Last Post: deanhystad
  Loading an array into a matrix Scott 1 1,147 Jun-01-2022, 07:08 PM
Last Post: paul18fr
  Creating a numpy array from specific values of a spreadsheet column JulianZ 0 1,078 Apr-19-2022, 07:36 AM
Last Post: JulianZ

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020