Posts: 10
Threads: 3
Joined: Nov 2018
Hi,
I'm trying to sort a large txt file on 2 columns: date and then time: I've enclosed a small sample of this:
1,2,3,4,5,6,7,8,9,10
20190218,17:12:57,1903,1,0,,,,,20190217
20190218,18:8:34,1903,1,0,,,,,20190217
20190218,0:0:37,1903,1,0,,,,,20190218
20190218,11:46:48,1903,1,0,,,,,20190218
The list above is in the correct order I am looking for... ie sorted first by column 10 (date) and then by column 2 (time)
the actual data files are approx 20MB
i am struggling with the sorted() method particularly with converting the time from a string to a time.
any help would be great.
thanks
Posts: 12,026
Threads: 485
Joined: Sep 2016
Mar-27-2019, 10:31 AM
(This post was last modified: Mar-27-2019, 10:31 AM by Larz60+.)
This is tested with attached csv file I purposely messed up order of input file to test.
place file in same directory as script
import csv
import operator
import os
def sort_csv(filename, key1, key2):
with open(filename) as fp:
crdr = csv.reader(fp)
# set key1 and key2 to column numbers of keys
filedata = sorted(crdr, key=lambda row: (row[key1], row[key2]))
return filedata
if __name__ == '__main__':
# Assure in right directory
os.chdir(os.path.abspath(os.path.dirname(__file__)))
filename = 'st01_al_cou.csv'
# these are column numbers zero based
first_key = 2
second_key = 3
mylist = sort_csv(filename, first_key, second_key)
for line in mylist:
print(line)
Posts: 10
Threads: 3
Joined: Nov 2018
Wow - thank you so much that works perfectly
Posts: 1,950
Threads: 8
Joined: Jun 2018
Mar-27-2019, 10:57 AM
(This post was last modified: Mar-27-2019, 10:57 AM by perfringo.)
Alternative approach, not so good as Larz60+ but still might be usable (more akin to OP-s request: 'method particularly with converting the time from a string to a time'). Takes advantage of built-in datetime module, for demonstration purposes I use list:
from datetime import datetime
lst = [
'20190218,18:8:34,1903,1,0,,,,,20190217',
'20190218,0:0:37,1903,1,0,,,,,20190218',
'20190218,11:46:48,1903,1,0,,,,,20190218',
'20190218,17:12:57,1903,1,0,,,,,20190217'
]
def by_datetime(row):
get_datetime = ','.join(row.split(',')[:2])
return datetime.strptime(get_datetime, "%Y%m%d,%H:%M:%S")
sorted(lst, key=by_datetime)
['20190218,0:0:37,1903,1,0,,,,,20190218',
'20190218,11:46:48,1903,1,0,,,,,20190218',
'20190218,17:12:57,1903,1,0,,,,,20190217',
'20190218,18:8:34,1903,1,0,,,,,20190217']
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Posts: 10
Threads: 3
Joined: Nov 2018
thanks again - will give that a try. Always nice to have alternatives
|