Python Forum
How to further boost the data read write speed using pandas - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: How to further boost the data read write speed using pandas (/thread-38705.html)



How to further boost the data read write speed using pandas - tjk9501 - Nov-14-2022

Hello!

I am using Cupy module to speed up linear algebra calculations using NVIDIA GPU, and after the calculation I need to retrive calculated results (A very large 2D matrix) from GPU and save it to local hard drive. Due to the large dimension of the matrix I need to use something like cupy.savez_compressed instead of save/savez to maintain a high compressive ratio and save disk space. And I find that using
`
import cupy as cp
Rt_cpu = pandas.DataFrame(data=cp.asnumpy(Rt_gpu))
Rt_cpu.to_pickle(filename,compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1})
`
can achieve a compromise between high compressive ratio and read-write speed (which costs 60 seconds for Rt_cpu read-write) , but this is still too slow for our needs. As a result, I also try other packages like pandasrallel OR modin.pandas backend by ray or dask, but the ultimate read-write speed is much slower than using the pandas.to_pickle command. In summary, I am looking for a solution which can reach a read-write speed like 10 times faster than to_pickle command for extremely large 2d arrays (with its dimension being 512*512/32 by 240*200*300/40) combined with high compressive ratio (similar to cupy.savez_compressed). Can anyone provide a solution? Thanks!!!!!


RE: How to further boost the data read write speed using pandas - jefsummers - Nov-14-2022

I have not used, but would check out Dask. Also, make sure you are using Python 3.11 as that has multiple speed improvements over prior versions.