How to further boost the data read write speed using pandas - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: How to further boost the data read write speed using pandas (/thread-38705.html) |
How to further boost the data read write speed using pandas - tjk9501 - Nov-14-2022 Hello! I am using Cupy module to speed up linear algebra calculations using NVIDIA GPU, and after the calculation I need to retrive calculated results (A very large 2D matrix) from GPU and save it to local hard drive. Due to the large dimension of the matrix I need to use something like cupy.savez_compressed instead of save/savez to maintain a high compressive ratio and save disk space. And I find that using `import cupy as cp Rt_cpu = pandas.DataFrame(data=cp.asnumpy(Rt_gpu)) Rt_cpu.to_pickle(filename,compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}) ` can achieve a compromise between high compressive ratio and read-write speed (which costs 60 seconds for Rt_cpu read-write) , but this is still too slow for our needs. As a result, I also try other packages like pandasrallel OR modin.pandas backend by ray or dask, but the ultimate read-write speed is much slower than using the pandas.to_pickle command. In summary, I am looking for a solution which can reach a read-write speed like 10 times faster than to_pickle command for extremely large 2d arrays (with its dimension being 512*512/32 by 240*200*300/40) combined with high compressive ratio (similar to cupy.savez_compressed). Can anyone provide a solution? Thanks!!!!! RE: How to further boost the data read write speed using pandas - jefsummers - Nov-14-2022 I have not used, but would check out Dask. Also, make sure you are using Python 3.11 as that has multiple speed improvements over prior versions. |