Posts: 161
Threads: 36
Joined: Jun 2018
My program can generate huge numpy arrays. Personally, I think a sensible maximum size is an array of shape (819200, 460800, 4) - a 4K (4096x2304) image scaled up 200 times. Although it can create bigger, if you do it's just a bit stupid.
According to numpy, if you try and create an array that big:
datas = np.zeros((819200, 460800, 4), np.unint8) Error: Unable to allocate 1.37 TiB for an array with shape (819200, 460800, 4) and data type uint8
Well... 1.37TB.... that's a lot. Of course, that's because it's trying to load it into RAM. After doing some research, I found H5PY which means I can store (and modify) my massive numpy array on disk, rather than RAM (that means it does take a performance hit however).
A numpy array that size, full of zeros only takes 4KB of disk space with H5PY (using GZIP compression)
with h5py.File("mytestfile.hdf5", "w") as f:
dset = f.create_dataset("mydataset", data=(819200, 460800, 4), dtype='i', compression='gzip') #creates 4KB file
Let's say that array is now full of RGBA values.
To save it as an image I need to do: cv2.imwrite("someimage.png", np.array(dset)) . The issue? Well I have to load the dataset into a numpy array to save it as an image which means loading the array into RAM meaning I get a memory error.
Unfortunately, I'm not able to do this: cv2.imwrite("someimage.png", dset) because cv2 isn't able to read from a H5PY dataset.
Has anyone got an idea of how I can save my numpy array to an image without loading it into RAM?
Posts: 1,358
Threads: 2
Joined: May 2019
Use someone else's RAM - I did a quick Google search for converting hdf5 file to an image and found a number of sites that handle that, including https://mygeodata.cloud/converter/hdf5-to-tiff
So, rather than doing the conversion yourself, take your resulting file and convert using alternate software
Posts: 161
Threads: 36
Joined: Jun 2018
Thanks for the reply. It turns out it was an easy fix. Rather than doing np.array(dset) you have to do f["mydataset"][()] . cv2 seems to accept that happily.
Posts: 817
Threads: 1
Joined: Mar 2018
Numpy allows to create arrays that are stored on disk. Take a look at memmap function.
data = np.memmap('output.dat', shape=(819200, 460800, 4), mode='w+', dtype='i')
Posts: 161
Threads: 36
Joined: Jun 2018
Jun-04-2020, 06:39 PM
(This post was last modified: Jun-04-2020, 06:40 PM by DreamingInsanity.)
(Jun-04-2020, 03:57 AM)scidam Wrote: Numpy allows to create arrays that are stored on disk. Take a look at memmap function.
data = np.memmap('output.dat', shape=(819200, 460800, 4), mode='w+', dtype='i') Thanks for the reply. It doesn't look like you can specify a compression algorithm for this, so it takes up a bit much space compared to hd5py.
Sadly, the issue is no longer with creating a numpy array that big, and still have it editable, but rather the saving of the image. cv2 seems to load the image back into ram completely voiding all the work I just did to store it on disk.
I am still looking for a C based python library, like cv2, that rather than creating a png file, allows me to append to a png. This would allow me to break the array down into chunks, write the chunks to the png file, and them free up memory for the next round. I have created a simple script like that (using a numpy based png writer, called 'numpngw'), but it's very slow. (I don't think it's ever going to be fast on disk though):
def factors(x):
result = []
i = 1
while i*i <= x:
if x % i == 0:
result.append(i)
if x//i != i:
result.append(x//i)
i += 1
return result
def get_divisor(x):
fctrs = sorted(factors(x[0]))[::-1]
i = 0
while True:
try:
a = np.zeros((fctrs[i], fctrs[i], 4)) #finding the largest array that can be stored in memory means I can find the highest factor that will allow me to split up the original array and keep it as large as possible
return fctrs[i]
except MemoryError:
pass
i += 1
def test():
f = h5py.File("mytestfile.hdf5", "w")
dset = f.create_dataset("mydataset", (100000,100000,4), dtype=np.uint8, compression='gzip')
#arr = np.random.uniform(low=0, high=255, size=(10000,10000,4))
shp = dset.shape
step = get_divisor(shp)
png = open("new.png", "wb")
numpngw._write_header_and_meta(png, 8, shp, color_type=6, bitdepth=8, palette=None, #i'm manually writing to the png file rather than writing all data at once, so i can append data over and over again.
interlace=0, text_list=None, timestamp=None, sbit=None, gamma=None, iccp=None,
chromaticity=None, trans=None, background=None, phys=None)
#_max = shp[0] if not step == shp[0] else shp[0]*2 #range is 'upto but not including' so if for some reason 'step' equals the size of the array, I 'double the size' to allow step to reach the real size of the array.
for i in range(step, shp[0]+step, step): #from step to
numpngw._write_data(png, dset[:i], bitdepth=8, max_chunk_len=step, #writing the data in largest chunks I can
filter_type=None, interlace=0)
png.flush() #flush the buffer
gc.collect() #try and free up memory for next round.
numpngw._write_iend(png)
png.close()
f.close() (it's very messy code and there's no comments since it was just a test - it uses a modified version of numpngw, removing '__all__' from the file to allow me to access all functions, so i can create my own 'append' function)
Posts: 161
Threads: 36
Joined: Jun 2018
Don't even know if anyone is going to ever need this answer but it's here just in case.
I tried out PIL again. I decided to use TIFF files (they are fast, and can be animated so I don't need another library for gifs). Unfortunately I had the same problem I've had with all the libraries where they don't support appending data to an image (ie. you cant open the image file in the mode 'a'). Although Image.save have the parameter 'append' for TIFF files, that's used ig you want to add more than one image to the TIFF file.
I stuck with TIFFs and looked for a good python TIFF library. tifffile seemed to do everything I needed. It even has a memmap implementation which I used.
My code is now:
f = h5py.File("test.hdf5", "w")
dset = f.create_dataset("test", (10000,10000,3), dtype=np.uint8, compression='gzip')
memmap_image = tifffile.memmap('temp.tif', shape=dset.shape, dtype='uint8')
memmap_image[:] = dset[:]
memmap_image.flush()
del memmap_image
gc.collect() I'm sure there's error that's going to crop up somehow somewhere in my code but at least it's working for now.
Posts: 1
Threads: 0
Joined: Feb 2024
This is EXACTLY the problem i'm facing right now and this thread was an absolute saver. Tank you very much.
(Jun-06-2020, 01:34 PM)DreamingInsanity Wrote: Don't even know if anyone is going to ever need this answer but it's here just in case.
I tried out PIL again. I decided to use TIFF files (they are fast, and can be animated so I don't need another library for gifs). Unfortunately I had the same problem I've had with all the libraries where they don't support appending data to an image (ie. you cant open the image file in the mode 'a'). Although Image.save have the parameter 'append' for TIFF files, that's used ig you want to add more than one image to the TIFF file.
I stuck with TIFFs and looked for a good python TIFF library. tifffile seemed to do everything I needed. It even has a memmap implementation which I used.
My code is now:
f = h5py.File("test.hdf5", "w")
dset = f.create_dataset("test", (10000,10000,3), dtype=np.uint8, compression='gzip')
memmap_image = tifffile.memmap('temp.tif', shape=dset.shape, dtype='uint8')
memmap_image[:] = dset[:]
memmap_image.flush()
del memmap_image
gc.collect() I'm sure there's error that's going to crop up somehow somewhere in my code but at least it's working for now.
Posts: 300
Threads: 72
Joined: Apr 2019
Some time ago, I used h5py to record pictures into a h5 file (very usefull and powerfull); in the following example:
- a basic image is created using matplotlib
- the image is saved into the h5 file as a picture (you can see it under hdfview for instance)
- the image is saved into the h5 file as an array
- the previous picture is recovered an saved locally into a gif image
(you can create a gif animated with ImageMagick for instance, but it's another topic)
Here after 2 screenshots of the picture into hdfview.
Maybe it add some features on your project.
Paul
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.backends.backend_agg import FigureCanvas
import os, h5py
Path = str(os.getcwd())
### 0) to create a fig
plt.style.use('_mpl-gallery')
# make data
x = np.linspace(0, 10, 100)
y = 4 + 2 * np.sin(2 * x)
# plot
fig, ax = plt.subplots(figsize=(16,16))
ax.plot(x, y, linewidth=2.0)
ax.set(xlim=(0, 8), xticks=np.arange(1, 8),
ylim=(0, 8), yticks=np.arange(1, 8))
plt.show()
## 1) Convertion Matplotlib fig into aNumpyArray ---> mandatory to change the picture into a matrix
canvas = FigureCanvas(fig)
canvas.draw()
FigArray = np.array(canvas.renderer.buffer_rgba())
## 2) write directly a picture into the h5 file --> Note dataset is MANDATORY
h5 = h5py.File(Path + '/test_picture.h5', 'w')
ImageDataset = h5.create_dataset(name = "Example_picture", data = FigArray, dtype = 'uint8', compression = 'gzip')
ImageDataset.attrs["CLASS"] = np.string_("IMAGE")
ImageDataset.attrs["IMAGE_VERSION"] = np.string_("1.2")
ImageDataset.attrs["IMAGE_SUBCLASS"] = np.string_("IMAGE_TRUECOLOR")
ImageDataset.attrs["INTERLACE_MODE"] = np.string_("INTERLACE_MODE")
ImageDataset.attrs["IMAGE_MINMAXRANGE"] = np.uint8(0.255)
h5.close()
## 3) Write the fig into a basic array (the h5 file)
h5 = h5py.File(Path + '/test_array.h5', 'w')
NpArrays = h5.create_group('Fig')
DatasetFigs = NpArrays.create_dataset(name = 'Example of array', data = FigArray, dtype='f', compression = 'gzip')
h5.flush()
h5.close()
## 4) array recovering => save into a gif file
from PIL import Image as im
with h5py.File(Path + '/test_picture.h5','r') as pict:
data = pict.get('/Example_picture')
Array = np.array(data)
# array is converted int a picture
Picture = im.fromarray(Array)
Picture.save(Path + '/export.gif')
Attached Files
Thumbnail(s)
|