Python Forum

Full Version: python sort date
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
I am new to python and got a requirement to sort the content of the text file based on timestamp in reverse order. Below is the content of text file(in.txt)

2020/10/31:09:05:01 734691 445750 384860 557946
2020/10/31:15:05:01 734691 366500 315620 554140
2020/10/31:21:05:01 705959 177500 153041 513408

Below was written but getting the below error.

from datetime import datetime

with open('in.txt') as f:
     sorted_lines = sorted([l.rstrip() for l in f.readlines()],
                          key=lambda line: datetime.strptime(line.split(" ")[0], "%Y/%m/%d:%H:%M:%S"))
                            "%Y/%m/%d:%H:%M:%S"),reverse=True)
     for line in sorted_lines:
        print(line)
Error
key=lambda line: datetime.strptime(line.split(" ")[0], "%Y/%m/%d:%H:%M:%S"),reverse=True)
File "/usr/lib64/python2.7/_strptime.py", line 325, in _strptime
(data_string, format))
ValueError: time data '' does not match format '%Y/%m/%d:%H:%M:%S'

Unable to determine why the error is happening.Could please help.
from the error message it looks like you have empty line(s) probably at the end of the file
from datetime import datetime


def parse_datetime(line):
    date_str, _ = line.split(maxsplit=1)
    date_fmt = "%Y/%m/%d:%H:%M:%S"
    return datetime.strptime(date_str, date_fmt)


with open("in.txt") as fd:
    for line in sorted(fd, key=parse_datetime, reverse=True):
        # line is still a str
        print(line.strip())
        # print(line, end="")
This will still fail with corrupt data.
You can catch these errors and return datetime.min, which is the earliest possible date represented by datetime.


def parse_datetime(line):
    date_fmt = "%Y/%m/%d:%H:%M:%S"
    try:
        date_str, _ = line.split(maxsplit=1)
        return datetime.strptime(date_str, date_fmt)
    except ValueError:
        # invalid format
        return datetime.min
        # datetime.min == datetime.datetime(1, 1, 1, 0, 0)
        # used as minimum value
        # you can't mix if you sort, so all elements must be a datetime
  • fd (the file object) is an iterator. Iterating over the fd will split the lines.
  • sort(fd) will sort all lines of the whole file in lexicographical order.
  • The key function of sort return a datetime object.
  • sorting requires comparison. Python has a strong TypeSafety, so you can't for example compare an int with a str. But you can ship around with the key function, which always return the same type.

    If you want to put the data into a data-structure (e.g. a dict or namedtuple), I would do the parsing first, put this data into a list and sort the list, when everything has been finished.
I get a syntax error working with you code. Is there a paste error in you post? Why does the date time pattern appear twice?
Since the dates are nearly in iso form (with YYYY/MM/DD etc.), you may simply sort the lexicographic form.

sorted_lines = sorted(f, reverse=True)
(Nov-04-2020, 05:23 PM)chrischarley Wrote: [ -> ]Since the dates are nearly in iso form (with YYYY/MM/DD etc.), you may simply sort the lexicographic form.

sorted_lines = sorted(f, reverse=True)
But do we know for sure it is MM and DD?
(Nov-04-2020, 05:30 PM)deanhystad Wrote: [ -> ]
(Nov-04-2020, 05:23 PM)chrischarley Wrote: [ -> ]Since the dates are nearly in iso form (with YYYY/MM/DD etc.), you may simply sort the lexicographic form.

sorted_lines = sorted(f, reverse=True)
But do we know for sure it is MM and DD?

That is true I didn't consider that possibility . But, his hours, minutes and seconds were 0 padded when less than 10.
(Nov-04-2020, 05:18 PM)DeaD_EyE Wrote: [ -> ]
from datetime import datetime


def parse_datetime(line):
    date_str, _ = line.split(maxsplit=1)
    date_fmt = "%Y/%m/%d:%H:%M:%S"
    return datetime.strptime(date_str, date_fmt)


with open("in.txt") as fd:
    for line in sorted(fd, key=parse_datetime, reverse=True):
        # line is still a str
        print(line.strip())
        # print(line, end="")
This will still fail with corrupt data.
You can catch these errors and return datetime.min, which is the earliest possible date represented by datetime.


def parse_datetime(line):
    date_fmt = "%Y/%m/%d:%H:%M:%S"
    try:
        date_str, _ = line.split(maxsplit=1)
        return datetime.strptime(date_str, date_fmt)
    except ValueError:
        # invalid format
        return datetime.min
        # datetime.min == datetime.datetime(1, 1, 1, 0, 0)
        # used as minimum value
        # you can't mix if you sort, so all elements must be a datetime
  • fd (the file object) is an iterator. Iterating over the fd will split the lines.
  • sort(fd) will sort all lines of the whole file in lexicographical order.
  • The key function of sort return a datetime object.
  • sorting requires comparison. Python has a strong TypeSafety, so you can't for example compare an int with a str. But you can ship around with the key function, which always return the same type.

    If you want to put the data into a data-structure (e.g. a dict or namedtuple), I would do the parsing first, put this data into a list and sort the list, when everything has been finished.



Hi

I got the below error, using python version 2.7.5. Will that be due to version.
date_str, _ = line.split(maxsplit=1)
TypeError: split() takes no keyword arguments

So changed as below with no arguments for split. But that also didnt help.
date_str, _ = line.split()
ValueError: too many values to unpack
(Nov-04-2020, 05:23 PM)deanhystad Wrote: [ -> ]I get a syntax error working with you code. Is there a paste error in you post? Why does the date time pattern appear twice?


No, I tested it again. Does work with Python 3.9 and should work with older versions.
Maybe the comments are confusing the repl, if you use copy&paste.

The parse_date was made twice to show:
  1. How to split tasks into smaller easier tasks -> better for testing. For example, you can use this function to test each line. No file for testing needed at all.
  2. How to catch Exceptions, handle them, retuning a default value for sorting.

To understand the sorting part, try this:

sort(sorted([1, 2, 3, 4, "a"]))
Error:
TypeError Traceback (most recent call last) <ipython-input-4-f654efc1df6a> in <module> ----> 1 sorted([1, 2, 3, 4, "a"]) TypeError: '<' not supported between instances of 'str' and 'int'
Same with datetime objects.
This works because datetime is comparable with datetime:
sorted([datetime.min, datetime.max, datetime(2020,1,1), datetime(1990,1,1)])
But this won't work:
sorted([datetime(2020,1,1), 1])
Error:
TypeError: '<' not supported between instances of 'int' and 'datetime.datetime'
This is why a key-function for sorting should always return the same type.
You could remove lines with wrong formatting before you sort.
Or you have a situation, where you want still to keep the lines with wrong format and sorting them.

There is no one universal solution for all.
Keep learning the basics before you switch to pandas.
(Nov-04-2020, 05:52 PM)beginner2020 Wrote: [ -> ]I got the below error, using python version 2.7.5. Will that be due to version.
date_str, _ = line.split(maxsplit=1)
TypeError: split() takes no keyword arguments

I saw it too late.

You should avoid the use of Python 2.7. It's end of life.
Python 3.6 is the oldest available version which has still support.
Python 3.5 has also reached the end of life.

That maxsplit can be a keyword-argument was introduced with Python 3.3.

Start your repl or script with python3.
Just python without a 3 will start the Python 2.7 interpreter on the most distributions.
Pages: 1 2