Python Forum
Merging sorted dataframes using Pandas
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Merging sorted dataframes using Pandas
#1
I have a large (Nx4, >10GB) array that I need to sort based on col.2.

I am reading my data in chunks and sorting using Pandas. But I am unable to combine the sorted chunks to give me a final large Nx4 array that is sorted on Col.2. I want this process to be as fast as possible as well. Here is what I have tried yet:

chunks = pd.read_csv(ifile[0], chunksize=50000, skiprows=0,
                     names=['col-1', 'col-2', 'col-3', 'col-4'])

for df in chunks:
    df = df.sort_values(by='col-2', kind='mergesort') # sorted chunks
    print(df)
Reply
#2
Pandas may not be the tool for that. Personally, I would use SQL. Create a table that size, do a select query to order by the second column, write out the result set.

Just an idea.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  SORTED.group by and count average of two columns [ sum of col 1 / sum of col 2 ] BSDevo 1 678 Oct-23-2023, 09:47 PM
Last Post: BSDevo
  Pandas dataframes and numpy arrays bytecrunch 1 1,325 Oct-11-2022, 08:08 PM
Last Post: Larz60+
  Pandas/Dataframes, Strings and Regular Expressions... Stephan 0 1,313 Nov-25-2020, 08:08 AM
Last Post: Stephan
  Python PDF merging from an excel pandas for loop siraero 0 2,188 Aug-16-2020, 09:34 AM
Last Post: siraero
  Merging two DataFrames based on indexes from two other DataFrames lucinda_rigeitti 0 1,746 Jan-16-2020, 08:36 PM
Last Post: lucinda_rigeitti
  Why can't I merge pandas dataframes learnpython2018 2 7,674 Sep-23-2018, 05:53 PM
Last Post: learnpython2018
  Pandas - Write to Exisitng Excel File - Sorted List dj99 4 14,881 Jul-29-2018, 07:56 AM
Last Post: dj99

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020