Hello!
I am using Cupy module to speed up linear algebra calculations using NVIDIA GPU, and after the calculation I need to retrive calculated results (A very large 2D matrix) from GPU and save it to local hard drive. Due to the large dimension of the matrix I need to use something like cupy.savez_compressed instead of save/savez to maintain a high compressive ratio and save disk space. And I find that using
import cupy as cp
Rt_cpu = pandas.DataFrame(data=cp.asnumpy(Rt_gpu))
Rt_cpu.to_pickle(filename,compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1})
can achieve a compromise between high compressive ratio and read-write speed (which costs 60 seconds for Rt_cpu read-write) , but this is still too slow for our needs. As a result, I also try other packages like pandasrallel OR modin.pandas backend by ray or dask, but the ultimate read-write speed is much slower than using the pandas.to_pickle command. In summary, I am looking for a solution which can reach a read-write speed like 10 times faster than to_pickle command for extremely large 2d arrays (with its dimension being 512*512/32 by 240*200*300/40) combined with high compressive ratio (similar to cupy.savez_compressed). Can anyone provide a solution? Thanks!!!!!
I am using Cupy module to speed up linear algebra calculations using NVIDIA GPU, and after the calculation I need to retrive calculated results (A very large 2D matrix) from GPU and save it to local hard drive. Due to the large dimension of the matrix I need to use something like cupy.savez_compressed instead of save/savez to maintain a high compressive ratio and save disk space. And I find that using
`import cupy as cp
Rt_cpu = pandas.DataFrame(data=cp.asnumpy(Rt_gpu))
Rt_cpu.to_pickle(filename,compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1})
` can achieve a compromise between high compressive ratio and read-write speed (which costs 60 seconds for Rt_cpu read-write) , but this is still too slow for our needs. As a result, I also try other packages like pandasrallel OR modin.pandas backend by ray or dask, but the ultimate read-write speed is much slower than using the pandas.to_pickle command. In summary, I am looking for a solution which can reach a read-write speed like 10 times faster than to_pickle command for extremely large 2d arrays (with its dimension being 512*512/32 by 240*200*300/40) combined with high compressive ratio (similar to cupy.savez_compressed). Can anyone provide a solution? Thanks!!!!!