Python Forum
How to further boost the data read write speed using pandas
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to further boost the data read write speed using pandas
#1
Smile 
Hello!

I am using Cupy module to speed up linear algebra calculations using NVIDIA GPU, and after the calculation I need to retrive calculated results (A very large 2D matrix) from GPU and save it to local hard drive. Due to the large dimension of the matrix I need to use something like cupy.savez_compressed instead of save/savez to maintain a high compressive ratio and save disk space. And I find that using
`
import cupy as cp
Rt_cpu = pandas.DataFrame(data=cp.asnumpy(Rt_gpu))
Rt_cpu.to_pickle(filename,compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1})
`
can achieve a compromise between high compressive ratio and read-write speed (which costs 60 seconds for Rt_cpu read-write) , but this is still too slow for our needs. As a result, I also try other packages like pandasrallel OR modin.pandas backend by ray or dask, but the ultimate read-write speed is much slower than using the pandas.to_pickle command. In summary, I am looking for a solution which can reach a read-write speed like 10 times faster than to_pickle command for extremely large 2d arrays (with its dimension being 512*512/32 by 240*200*300/40) combined with high compressive ratio (similar to cupy.savez_compressed). Can anyone provide a solution? Thanks!!!!!
Reply
#2
I have not used, but would check out Dask. Also, make sure you are using Python 3.11 as that has multiple speed improvements over prior versions.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Grouping in pandas/multi-index data frame Aleqsie 3 691 Jan-06-2024, 03:55 PM
Last Post: deanhystad
Photo read matlab data pz16 1 1,469 Oct-06-2023, 11:00 PM
Last Post: snippsat
  [solved] how to speed-up huge data in an ascii file ? paul18fr 4 1,266 May-16-2023, 08:36 PM
Last Post: paul18fr
  Pandas read csv file in 'date/time' chunks MorganSamage 4 1,713 Feb-13-2023, 11:24 AM
Last Post: MorganSamage
Thumbs Up can't access data from URL in pandas/jupyter notebook aaanoushka 1 1,875 Feb-13-2022, 01:19 PM
Last Post: jefsummers
Question Sorting data with pandas TheZaind 4 2,359 Nov-22-2021, 07:33 PM
Last Post: aserian
  Pandas Data frame column condition check based on length of the value aditi06 1 2,701 Jul-28-2021, 11:08 AM
Last Post: jefsummers
  [Pandas] Write data to Excel with dot decimals manonB 1 5,903 May-05-2021, 05:28 PM
Last Post: ibreeden
  pandas.to_datetime: Combine data from 2 columns ju21878436312 1 2,458 Feb-20-2021, 08:25 PM
Last Post: perfringo
  How to compare two json and write to third json differences with pandas and numpy onenessboy 0 4,721 Jul-24-2020, 01:56 PM
Last Post: onenessboy

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020