Merging sorted dataframes using Pandas - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Merging sorted dataframes using Pandas (/thread-28983.html) |
Merging sorted dataframes using Pandas - Robotguy - Aug-12-2020 I have a large (Nx4, >10GB) array that I need to sort based on col.2. I am reading my data in chunks and sorting using Pandas. But I am unable to combine the sorted chunks to give me a final large Nx4 array that is sorted on Col.2. I want this process to be as fast as possible as well. Here is what I have tried yet: chunks = pd.read_csv(ifile[0], chunksize=50000, skiprows=0, names=['col-1', 'col-2', 'col-3', 'col-4']) for df in chunks: df = df.sort_values(by='col-2', kind='mergesort') # sorted chunks print(df) RE: Merging sorted dataframes using Pandas - jefsummers - Aug-12-2020 Pandas may not be the tool for that. Personally, I would use SQL. Create a table that size, do a select query to order by the second column, write out the result set. Just an idea. |