Convert numpy array to image without loading it into RAM.

DreamingInsanity · Jun-02-2020, 03:01 PM

My program can generate huge numpy arrays. Personally, I think a sensible maximum size is an array of shape (819200, 460800, 4) - a 4K (4096x2304) image scaled up 200 times. Although it can create bigger, if you do it's just a bit stupid.
According to numpy, if you try and create an array that big:

datas  = np.zeros((819200, 460800, 4), np.unint8)

Error:
Unable to allocate 1.37 TiB for an array with shape (819200, 460800, 4) and data type uint8

Well... 1.37TB.... that's a lot. Of course, that's because it's trying to load it into RAM. After doing some research, I found H5PY which means I can store (and modify) my massive numpy array on disk, rather than RAM (that means it does take a performance hit however).
A numpy array that size, full of zeros only takes 4KB of disk space with H5PY (using GZIP compression)

with h5py.File("mytestfile.hdf5", "w") as f:
	dset = f.create_dataset("mydataset", data=(819200, 460800, 4), dtype='i', compression='gzip') #creates 4KB file

Let's say that array is now full of RGBA values.
To save it as an image I need to do: cv2.imwrite("someimage.png", np.array(dset)). The issue? Well I have to load the dataset into a numpy array to save it as an image which means loading the array into RAM meaning I get a memory error.
Unfortunately, I'm not able to do this: cv2.imwrite("someimage.png", dset) because cv2 isn't able to read from a H5PY dataset.

Has anyone got an idea of how I can save my numpy array to an image without loading it into RAM?

jefsummers · Jun-02-2020, 06:24 PM

Use someone else's RAM - I did a quick Google search for converting hdf5 file to an image and found a number of sites that handle that, including https://mygeodata.cloud/converter/hdf5-to-tiff

So, rather than doing the conversion yourself, take your resulting file and convert using alternate software

DreamingInsanity · Jun-03-2020, 12:30 PM

Thanks for the reply. It turns out it was an easy fix. Rather than doing np.array(dset) you have to do f["mydataset"][()]. cv2 seems to accept that happily.

**scidam** · Jun-04-2020, 03:57 AM

Numpy allows to create arrays that are stored on disk. Take a look at memmap function.

data = np.memmap('output.dat', shape=(819200, 460800, 4), mode='w+', dtype='i')

DreamingInsanity · (This post was last modified: Jun-04-2020, 06:40 PM by DreamingInsanity.)

(Jun-04-2020, 03:57 AM)scidam Wrote: Numpy allows to create arrays that are stored on disk. Take a look at memmap function.
data = np.memmap('output.dat', shape=(819200, 460800, 4), mode='w+', dtype='i')

Thanks for the reply. It doesn't look like you can specify a compression algorithm for this, so it takes up a bit much space compared to hd5py.

Sadly, the issue is no longer with creating a numpy array that big, and still have it editable, but rather the saving of the image. cv2 seems to load the image back into ram completely voiding all the work I just did to store it on disk.

I am still looking for a C based python library, like cv2, that rather than creating a png file, allows me to append to a png. This would allow me to break the array down into chunks, write the chunks to the png file, and them free up memory for the next round. I have created a simple script like that (using a numpy based png writer, called 'numpngw'), but it's very slow. (I don't think it's ever going to be fast on disk though):

Hide/Show

def factors(x):
    result = []
    i = 1
    while i*i <= x:
        if x % i == 0:
            result.append(i)
            if x//i != i:
                result.append(x//i)
        i += 1
    return result
def get_divisor(x):
	fctrs = sorted(factors(x[0]))[::-1]
	i = 0
	while True:
		try:
			a = np.zeros((fctrs[i], fctrs[i], 4)) #finding the largest array that can be stored in memory means I can find the highest factor that will allow me to split up the original array and keep it as large as possible
			return fctrs[i]
		except MemoryError:
			pass
		i += 1

def test():
	f = h5py.File("mytestfile.hdf5", "w")
	dset = f.create_dataset("mydataset", (100000,100000,4), dtype=np.uint8, compression='gzip')

	#arr = np.random.uniform(low=0, high=255, size=(10000,10000,4))
	shp = dset.shape	
	step = get_divisor(shp)


	png = open("new.png", "wb")
	numpngw._write_header_and_meta(png, 8, shp, color_type=6, bitdepth=8, palette=None, #i'm manually writing to the png file rather than writing all data at once, so i can append data over and over again.
							interlace=0, text_list=None, timestamp=None, sbit=None, gamma=None, iccp=None,
							chromaticity=None, trans=None, background=None, phys=None)

	#_max = shp[0] if not step == shp[0] else shp[0]*2 #range is 'upto but not including' so if for some reason 'step' equals the size of the array, I 'double the size' to allow step to reach the real size of the array.
	for i in range(step, shp[0]+step, step): #from step to 
		numpngw._write_data(png, dset[:i], bitdepth=8, max_chunk_len=step, #writing the data in largest chunks I can
					filter_type=None, interlace=0)
		png.flush() #flush the buffer
		gc.collect() #try and free up memory for next round.

	numpngw._write_iend(png)

	png.close()
	f.close()

(it's very messy code and there's no comments since it was just a test - it uses a modified version of numpngw, removing '__all__' from the file to allow me to access all functions, so i can create my own 'append' function)

DreamingInsanity · Jun-06-2020, 01:34 PM

Don't even know if anyone is going to ever need this answer but it's here just in case.
I tried out PIL again. I decided to use TIFF files (they are fast, and can be animated so I don't need another library for gifs). Unfortunately I had the same problem I've had with all the libraries where they don't support appending data to an image (ie. you cant open the image file in the mode 'a'). Although Image.save have the parameter 'append' for TIFF files, that's used ig you want to add more than one image to the TIFF file.
I stuck with TIFFs and looked for a good python TIFF library. tifffile seemed to do everything I needed. It even has a memmap implementation which I used.
My code is now:

f = h5py.File("test.hdf5", "w")
dset = f.create_dataset("test", (10000,10000,3), dtype=np.uint8, compression='gzip')

memmap_image = tifffile.memmap('temp.tif', shape=dset.shape, dtype='uint8')
memmap_image[:] = dset[:]
memmap_image.flush()
del memmap_image
gc.collect()

I'm sure there's error that's going to crop up somehow somewhere in my code but at least it's working for now.

PurpleCypher · Feb-07-2024, 09:15 PM

This is EXACTLY the problem i'm facing right now and this thread was an absolute saver. Tank you very much.

(Jun-06-2020, 01:34 PM)DreamingInsanity Wrote: Don't even know if anyone is going to ever need this answer but it's here just in case.
I tried out PIL again. I decided to use TIFF files (they are fast, and can be animated so I don't need another library for gifs). Unfortunately I had the same problem I've had with all the libraries where they don't support appending data to an image (ie. you cant open the image file in the mode 'a'). Although Image.save have the parameter 'append' for TIFF files, that's used ig you want to add more than one image to the TIFF file.
I stuck with TIFFs and looked for a good python TIFF library. tifffile seemed to do everything I needed. It even has a memmap implementation which I used.
My code is now:
f = h5py.File("test.hdf5", "w")
dset = f.create_dataset("test", (10000,10000,3), dtype=np.uint8, compression='gzip')

memmap_image = tifffile.memmap('temp.tif', shape=dset.shape, dtype='uint8')
memmap_image[:] = dset[:]
memmap_image.flush()
del memmap_image
gc.collect()
I'm sure there's error that's going to crop up somehow somewhere in my code but at least it's working for now.

paul18fr · Feb-08-2024, 09:38 AM

Some time ago, I used h5py to record pictures into a h5 file (very usefull and powerfull); in the following example:

a basic image is created using matplotlib
the image is saved into the h5 file as a picture (you can see it under hdfview for instance)
the image is saved into the h5 file as an array
the previous picture is recovered an saved locally into a gif image

(you can create a gif animated with ImageMagick for instance, but it's another topic)

Here after 2 screenshots of the picture into hdfview.

Maybe it add some features on your project.

Paul

import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.backends.backend_agg import FigureCanvas
import os, h5py
 
Path = str(os.getcwd())

### 0) to create a fig
plt.style.use('_mpl-gallery')

# make data
x = np.linspace(0, 10, 100)
y = 4 + 2 * np.sin(2 * x)

# plot
fig, ax = plt.subplots(figsize=(16,16))
ax.plot(x, y, linewidth=2.0)

ax.set(xlim=(0, 8), xticks=np.arange(1, 8),
       ylim=(0, 8), yticks=np.arange(1, 8))

plt.show()
 
 
## 1) Convertion  Matplotlib fig into aNumpyArray ---> mandatory to change the picture into a matrix      
canvas = FigureCanvas(fig)
canvas.draw()
FigArray = np.array(canvas.renderer.buffer_rgba())
 
## 2) write directly a picture into the h5 file --> Note dataset is MANDATORY
h5 = h5py.File(Path + '/test_picture.h5', 'w')
ImageDataset = h5.create_dataset(name = "Example_picture", data = FigArray, dtype = 'uint8', compression = 'gzip')
ImageDataset.attrs["CLASS"] = np.string_("IMAGE")
ImageDataset.attrs["IMAGE_VERSION"] = np.string_("1.2")
ImageDataset.attrs["IMAGE_SUBCLASS"] = np.string_("IMAGE_TRUECOLOR")
ImageDataset.attrs["INTERLACE_MODE"] = np.string_("INTERLACE_MODE")
ImageDataset.attrs["IMAGE_MINMAXRANGE"] = np.uint8(0.255)
h5.close()


## 3) Write the fig into a basic array (the h5 file)
h5 = h5py.File(Path + '/test_array.h5', 'w')
NpArrays = h5.create_group('Fig')
DatasetFigs = NpArrays.create_dataset(name = 'Example of array', data = FigArray, dtype='f', compression = 'gzip')
h5.flush()
h5.close()


## 4) array recovering => save into a gif file
from PIL import Image as im
with h5py.File(Path + '/test_picture.h5','r') as pict:
    data = pict.get('/Example_picture')
    Array = np.array(data)
    
    # array is converted int a picture
    Picture = im.fromarray(Array)
    Picture.save(Path + '/export.gif')

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Numpy, array(2d,bool), flipping regions.	MvGulik	2	1,087	Oct-27-2024, 11:06 AM Last Post: MvGulik
	python code to calculate mean of an array of numbers using numpy	viren	3	1,316	May-29-2024, 04:49 PM Last Post: Gribouillis
	IPython errors for numpy array min/max methods	muelaner	1	1,604	Nov-04-2023, 09:22 PM Last Post: snippsat
	Convert np Array A to networkx G	IanAnderson	2	1,643	Jul-05-2023, 11:42 AM Last Post: IanAnderson
	Expand the range of a NumPy array?	PythonNPC	0	1,833	Jan-31-2023, 02:41 AM Last Post: PythonNPC
	Change a numpy array to a dataframe	Led_Zeppelin	3	2,874	Jan-26-2023, 09:01 PM Last Post: deanhystad
	from numpy array to csv - rounding	SchroedingersLion	6	6,644	Nov-14-2022, 09:09 PM Last Post: deanhystad
	numpy.array has no attribute head	Led_Zeppelin	1	3,414	Jul-13-2022, 12:56 AM Last Post: Led_Zeppelin
	Seeing al the data in a dataframe or numpy.array	Led_Zeppelin	1	1,788	Jul-11-2022, 08:54 PM Last Post: Larz60+
	go over and search in numpy array faster	caro	7	3,353	Jun-20-2022, 04:54 PM Last Post: deanhystad

Convert numpy array to image without loading it into RAM.

User Panel Messages

Announcements