Python Forum
Merging sorted dataframes using Pandas - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Merging sorted dataframes using Pandas (/thread-28983.html)



Merging sorted dataframes using Pandas - Robotguy - Aug-12-2020

I have a large (Nx4, >10GB) array that I need to sort based on col.2.

I am reading my data in chunks and sorting using Pandas. But I am unable to combine the sorted chunks to give me a final large Nx4 array that is sorted on Col.2. I want this process to be as fast as possible as well. Here is what I have tried yet:

chunks = pd.read_csv(ifile[0], chunksize=50000, skiprows=0,
                     names=['col-1', 'col-2', 'col-3', 'col-4'])

for df in chunks:
    df = df.sort_values(by='col-2', kind='mergesort') # sorted chunks
    print(df)



RE: Merging sorted dataframes using Pandas - jefsummers - Aug-12-2020

Pandas may not be the tool for that. Personally, I would use SQL. Create a table that size, do a select query to order by the second column, write out the result set.

Just an idea.