Python Forum
Sorting a large CVS file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Sorting a large CVS file
#1
What I have is a large 707,000 row CVS file. In col 9 of 10 there is an id which looks like EG47213, EB53955 and so on. There are probably about 700 different such ids. Column 9 is called 'individual-local-identifier'. I need to separate out all the rows with a given id. By looking at the first few rows "manually" so to speak I found that EG47213 ran from row 3844 for another 4127 rows.

I then tried

import pandas as pd
print("Hello, World!")
# Read the data into a variable pacto
url = "F:/Carrier Bag F/NAV Padget/Padget.csv"
pacto = pd.read_csv(url)
frutj = pacto[pacto['individual-local-identifier'] == "EG47213"]
kb = frutj.shape
print(kb)
To my surprise this said that there were 8627 such rows, not 4127. When I started searching by hand I found the missing rows in two separate locations. (This took me a long time and made me mad)

There must be a better way of going on. I had hoped to find in the pandas documentation something like a FOR instruction and then some way of writing IF ... THEN escape or something similar.

Maybe I missed something.
Reply
#2
(Oct-31-2019, 12:15 PM)DavidTheGrockle Wrote: I need to separate out all the rows with a given id.

What do you need to do? Do you need to separate out the rows with a given ID, or do you need to sort by the ID? You appear to already have the rows for a particular ID. To sort by that ID you would use sort_index:

pacto.sort_index(by = 'individual-local-identifier')
You may also want to look at the groupby method.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Data Sorting and filtering(From an Excel File) PY_ALM 0 1,012 Jan-09-2023, 08:14 PM
Last Post: PY_ALM
  Reading large crapy text file in anaconda to profile data syamatunuguntla 0 811 Nov-18-2022, 06:15 PM
Last Post: syamatunuguntla
  Chunking and Sorting a large file Robotguy 1 3,544 Jul-29-2020, 12:48 AM
Last Post: Larz60+
  extracting sublist from a large multiple molecular file juliocollm 2 2,262 May-25-2020, 12:49 PM
Last Post: juliocollm
  How to filter specific rows from large data file Ariane 7 8,144 Jun-29-2018, 02:43 PM
Last Post: gontajones
  access a very large file? As an array or as a dataframe? Angelika 5 4,861 May-18-2017, 08:15 AM
Last Post: Angelika

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020