Python Forum
Merging sorted dataframes using Pandas
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Merging sorted dataframes using Pandas
#1
I have a large (Nx4, >10GB) array that I need to sort based on col.2.

I am reading my data in chunks and sorting using Pandas. But I am unable to combine the sorted chunks to give me a final large Nx4 array that is sorted on Col.2. I want this process to be as fast as possible as well. Here is what I have tried yet:

chunks = pd.read_csv(ifile[0], chunksize=50000, skiprows=0,
                     names=['col-1', 'col-2', 'col-3', 'col-4'])

for df in chunks:
    df = df.sort_values(by='col-2', kind='mergesort') # sorted chunks
    print(df)
Reply


Messages In This Thread
Merging sorted dataframes using Pandas - by Robotguy - Aug-12-2020, 06:01 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  SORTED.group by and count average of two columns [ sum of col 1 / sum of col 2 ] BSDevo 1 716 Oct-23-2023, 09:47 PM
Last Post: BSDevo
  Pandas dataframes and numpy arrays bytecrunch 1 1,368 Oct-11-2022, 08:08 PM
Last Post: Larz60+
  Pandas/Dataframes, Strings and Regular Expressions... Stephan 0 1,343 Nov-25-2020, 08:08 AM
Last Post: Stephan
  Python PDF merging from an excel pandas for loop siraero 0 2,219 Aug-16-2020, 09:34 AM
Last Post: siraero
  Merging two DataFrames based on indexes from two other DataFrames lucinda_rigeitti 0 1,772 Jan-16-2020, 08:36 PM
Last Post: lucinda_rigeitti
  Why can't I merge pandas dataframes learnpython2018 2 7,749 Sep-23-2018, 05:53 PM
Last Post: learnpython2018
  Pandas - Write to Exisitng Excel File - Sorted List dj99 4 15,013 Jul-29-2018, 07:56 AM
Last Post: dj99

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020