Python Forum
Sorting csv or txt file on date and time - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Sorting csv or txt file on date and time (/thread-17068.html)



Sorting csv or txt file on date and time - AshBax - Mar-27-2019

Hi,

I'm trying to sort a large txt file on 2 columns: date and then time: I've enclosed a small sample of this:

1,2,3,4,5,6,7,8,9,10

20190218,17:12:57,1903,1,0,,,,,20190217
20190218,18:8:34,1903,1,0,,,,,20190217
20190218,0:0:37,1903,1,0,,,,,20190218
20190218,11:46:48,1903,1,0,,,,,20190218

The list above is in the correct order I am looking for... ie sorted first by column 10 (date) and then by column 2 (time)

the actual data files are approx 20MB

i am struggling with the
sorted()
method particularly with converting the time from a string to a time.


any help would be great.
thanks


RE: Sorting csv or txt file on date and time - Larz60+ - Mar-27-2019

This is tested with attached csv file I purposely messed up order of input file to test.
place file in same directory as script
import csv
import operator
import os

def sort_csv(filename, key1, key2):
    with open(filename) as fp:
        crdr = csv.reader(fp)
        # set key1 and key2 to column numbers of keys
        filedata = sorted(crdr, key=lambda row: (row[key1], row[key2]))
    return filedata

if __name__ == '__main__':
    # Assure in right directory
    os.chdir(os.path.abspath(os.path.dirname(__file__)))
    filename = 'st01_al_cou.csv'
    # these are column numbers zero based
    first_key = 2
    second_key = 3
    mylist = sort_csv(filename, first_key, second_key)
    for line in mylist:
        print(line)



RE: Sorting csv or txt file on date and time - AshBax - Mar-27-2019

Wow - thank you so much that works perfectly


RE: Sorting csv or txt file on date and time - perfringo - Mar-27-2019

Alternative approach, not so good as Larz60+ but still might be usable (more akin to OP-s request: 'method particularly with converting the time from a string to a time'). Takes advantage of built-in datetime module, for demonstration purposes I use list:

from datetime import datetime

lst = [
       '20190218,18:8:34,1903,1,0,,,,,20190217',
       '20190218,0:0:37,1903,1,0,,,,,20190218',
       '20190218,11:46:48,1903,1,0,,,,,20190218',
       '20190218,17:12:57,1903,1,0,,,,,20190217'
      ]

def by_datetime(row):
    get_datetime = ','.join(row.split(',')[:2])
    return datetime.strptime(get_datetime, "%Y%m%d,%H:%M:%S")

sorted(lst, key=by_datetime)

['20190218,0:0:37,1903,1,0,,,,,20190218',
 '20190218,11:46:48,1903,1,0,,,,,20190218',
 '20190218,17:12:57,1903,1,0,,,,,20190217',
 '20190218,18:8:34,1903,1,0,,,,,20190217']



RE: Sorting csv or txt file on date and time - AshBax - Mar-27-2019

thanks again - will give that a try. Always nice to have alternatives