Python Forum
Merging sorted dataframes using Pandas
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Merging sorted dataframes using Pandas
#1
I have a large (Nx4, >10GB) array that I need to sort based on col.2.

I am reading my data in chunks and sorting using Pandas. But I am unable to combine the sorted chunks to give me a final large Nx4 array that is sorted on Col.2. I want this process to be as fast as possible as well. Here is what I have tried yet:

chunks = pd.read_csv(ifile[0], chunksize=50000, skiprows=0,
                     names=['col-1', 'col-2', 'col-3', 'col-4'])

for df in chunks:
    df = df.sort_values(by='col-2', kind='mergesort') # sorted chunks
    print(df)
Reply
#2
Pandas may not be the tool for that. Personally, I would use SQL. Create a table that size, do a select query to order by the second column, write out the result set.

Just an idea.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  SORTED.group by and count average of two columns [ sum of col 1 / sum of col 2 ] BSDevo 1 638 Oct-23-2023, 09:47 PM
Last Post: BSDevo
  Pandas dataframes and numpy arrays bytecrunch 1 1,292 Oct-11-2022, 08:08 PM
Last Post: Larz60+
  Pandas/Dataframes, Strings and Regular Expressions... Stephan 0 1,283 Nov-25-2020, 08:08 AM
Last Post: Stephan
  Python PDF merging from an excel pandas for loop siraero 0 2,164 Aug-16-2020, 09:34 AM
Last Post: siraero
  Merging two DataFrames based on indexes from two other DataFrames lucinda_rigeitti 0 1,725 Jan-16-2020, 08:36 PM
Last Post: lucinda_rigeitti
  Why can't I merge pandas dataframes learnpython2018 2 7,624 Sep-23-2018, 05:53 PM
Last Post: learnpython2018
  Pandas - Write to Exisitng Excel File - Sorted List dj99 4 14,755 Jul-29-2018, 07:56 AM
Last Post: dj99

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020