How to further boost the data read write speed using pandas

tjk9501 · Nov-14-2022, 01:34 PM

Hello!

I am using Cupy module to speed up linear algebra calculations using NVIDIA GPU, and after the calculation I need to retrive calculated results (A very large 2D matrix) from GPU and save it to local hard drive. Due to the large dimension of the matrix I need to use something like cupy.savez_compressed instead of save/savez to maintain a high compressive ratio and save disk space. And I find that using
`
import cupy as cp
Rt_cpu = pandas.DataFrame(data=cp.asnumpy(Rt_gpu))
Rt_cpu.to_pickle(filename,compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1})
`
can achieve a compromise between high compressive ratio and read-write speed (which costs 60 seconds for Rt_cpu read-write) , but this is still too slow for our needs. As a result, I also try other packages like pandasrallel OR modin.pandas backend by ray or dask, but the ultimate read-write speed is much slower than using the pandas.to_pickle command. In summary, I am looking for a solution which can reach a read-write speed like 10 times faster than to_pickle command for extremely large 2d arrays (with its dimension being 512*512/32 by 240*200*300/40) combined with high compressive ratio (similar to cupy.savez_compressed). Can anyone provide a solution? Thanks!!!!!

jefsummers · Nov-14-2022, 01:46 PM

I have not used, but would check out Dask. Also, make sure you are using Python 3.11 as that has multiple speed improvements over prior versions.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Parsing "aTimeLogger" Android app data to graphs using pandas	Drone4four	8	3,071	Jun-23-2024, 07:12 AM Last Post: Drone4four
	Grouping in pandas/multi-index data frame	Aleqsie	3	2,219	Jan-06-2024, 03:55 PM Last Post: deanhystad
	read matlab data	pz16	1	4,557	Oct-06-2023, 11:00 PM Last Post: snippsat
	[solved] how to speed-up huge data in an ascii file ?	paul18fr	4	2,374	May-16-2023, 08:36 PM Last Post: paul18fr
	Pandas read csv file in 'date/time' chunks	MorganSamage	4	2,952	Feb-13-2023, 11:24 AM Last Post: MorganSamage
	can't access data from URL in pandas/jupyter notebook	aaanoushka	1	2,537	Feb-13-2022, 01:19 PM Last Post: jefsummers
	Sorting data with pandas	TheZaind	4	3,294	Nov-22-2021, 07:33 PM Last Post: aserian
	Pandas Data frame column condition check based on length of the value	aditi06	1	3,683	Jul-28-2021, 11:08 AM Last Post: jefsummers
	[Pandas] Write data to Excel with dot decimals	manonB	1	7,915	May-05-2021, 05:28 PM Last Post: ibreeden
	pandas.to_datetime: Combine data from 2 columns	ju21878436312	1	3,480	Feb-20-2021, 08:25 PM Last Post: perfringo

How to further boost the data read write speed using pandas

User Panel Messages

Announcements