Posts: 6
Threads: 2
Joined: Oct 2020
I am new to python and got a requirement to sort the content of the text file based on timestamp in reverse order. Below is the content of text file(in.txt)
2020/10/31:09:05:01 734691 445750 384860 557946
2020/10/31:15:05:01 734691 366500 315620 554140
2020/10/31:21:05:01 705959 177500 153041 513408
Below was written but getting the below error.
from datetime import datetime
with open('in.txt') as f:
sorted_lines = sorted([l.rstrip() for l in f.readlines()],
key=lambda line: datetime.strptime(line.split(" ")[0], "%Y/%m/%d:%H:%M:%S"))
"%Y/%m/%d:%H:%M:%S"),reverse=True)
for line in sorted_lines:
print(line) Error
key=lambda line: datetime.strptime(line.split(" ")[0], "%Y/%m/%d:%H:%M:%S"),reverse=True)
File "/usr/lib64/python2.7/_strptime.py", line 325, in _strptime
(data_string, format))
ValueError: time data '' does not match format '%Y/%m/%d:%H:%M:%S'
Unable to determine why the error is happening.Could please help.
Posts: 8,157
Threads: 160
Joined: Sep 2016
from the error message it looks like you have empty line(s) probably at the end of the file
Posts: 2,125
Threads: 11
Joined: May 2017
Nov-04-2020, 05:18 PM
(This post was last modified: Nov-04-2020, 05:32 PM by DeaD_EyE.)
from datetime import datetime
def parse_datetime(line):
date_str, _ = line.split(maxsplit=1)
date_fmt = "%Y/%m/%d:%H:%M:%S"
return datetime.strptime(date_str, date_fmt)
with open("in.txt") as fd:
for line in sorted(fd, key=parse_datetime, reverse=True):
# line is still a str
print(line.strip())
# print(line, end="") This will still fail with corrupt data.
You can catch these errors and return datetime.min , which is the earliest possible date represented by datetime.
def parse_datetime(line):
date_fmt = "%Y/%m/%d:%H:%M:%S"
try:
date_str, _ = line.split(maxsplit=1)
return datetime.strptime(date_str, date_fmt)
except ValueError:
# invalid format
return datetime.min
# datetime.min == datetime.datetime(1, 1, 1, 0, 0)
# used as minimum value
# you can't mix if you sort, so all elements must be a datetime fd (the file object) is an iterator. Iterating over the fd will split the lines.
sort(fd) will sort all lines of the whole file in lexicographical order.
- The key function of
sort return a datetime object.
- sorting requires comparison. Python has a strong TypeSafety, so you can't for example compare an
int with a str . But you can ship around with the key function, which always return the same type.
If you want to put the data into a data-structure (e.g. a dict or namedtuple), I would do the parsing first, put this data into a list and sort the list, when everything has been finished.
Posts: 6,780
Threads: 20
Joined: Feb 2020
Nov-04-2020, 05:23 PM
(This post was last modified: Nov-04-2020, 05:23 PM by deanhystad.)
I get a syntax error working with you code. Is there a paste error in you post? Why does the date time pattern appear twice?
Posts: 3
Threads: 0
Joined: Nov 2020
Since the dates are nearly in iso form (with YYYY/MM/DD etc.), you may simply sort the lexicographic form.
sorted_lines = sorted(f, reverse=True)
Posts: 6,780
Threads: 20
Joined: Feb 2020
(Nov-04-2020, 05:23 PM)chrischarley Wrote: Since the dates are nearly in iso form (with YYYY/MM/DD etc.), you may simply sort the lexicographic form.
sorted_lines = sorted(f, reverse=True) But do we know for sure it is MM and DD?
Posts: 3
Threads: 0
Joined: Nov 2020
Nov-04-2020, 05:52 PM
(This post was last modified: Nov-04-2020, 11:33 PM by chrischarley.)
(Nov-04-2020, 05:30 PM)deanhystad Wrote: (Nov-04-2020, 05:23 PM)chrischarley Wrote: Since the dates are nearly in iso form (with YYYY/MM/DD etc.), you may simply sort the lexicographic form.
sorted_lines = sorted(f, reverse=True) But do we know for sure it is MM and DD?
That is true I didn't consider that possibility . But, his hours, minutes and seconds were 0 padded when less than 10.
Posts: 6
Threads: 2
Joined: Oct 2020
Nov-04-2020, 05:52 PM
(This post was last modified: Nov-04-2020, 06:01 PM by beginner2020.
Edit Reason: Tried one more option
)
(Nov-04-2020, 05:18 PM)DeaD_EyE Wrote: from datetime import datetime
def parse_datetime(line):
date_str, _ = line.split(maxsplit=1)
date_fmt = "%Y/%m/%d:%H:%M:%S"
return datetime.strptime(date_str, date_fmt)
with open("in.txt") as fd:
for line in sorted(fd, key=parse_datetime, reverse=True):
# line is still a str
print(line.strip())
# print(line, end="") This will still fail with corrupt data.
You can catch these errors and return datetime.min , which is the earliest possible date represented by datetime.
def parse_datetime(line):
date_fmt = "%Y/%m/%d:%H:%M:%S"
try:
date_str, _ = line.split(maxsplit=1)
return datetime.strptime(date_str, date_fmt)
except ValueError:
# invalid format
return datetime.min
# datetime.min == datetime.datetime(1, 1, 1, 0, 0)
# used as minimum value
# you can't mix if you sort, so all elements must be a datetime fd (the file object) is an iterator. Iterating over the fd will split the lines.
sort(fd) will sort all lines of the whole file in lexicographical order.
- The key function of
sort return a datetime object.
- sorting requires comparison. Python has a strong TypeSafety, so you can't for example compare an
int with a str . But you can ship around with the key function, which always return the same type.
If you want to put the data into a data-structure (e.g. a dict or namedtuple), I would do the parsing first, put this data into a list and sort the list, when everything has been finished.
Hi
I got the below error, using python version 2.7.5. Will that be due to version.
date_str, _ = line.split(maxsplit=1)
TypeError: split() takes no keyword arguments
So changed as below with no arguments for split. But that also didnt help.
date_str, _ = line.split()
ValueError: too many values to unpack
Posts: 2,125
Threads: 11
Joined: May 2017
(Nov-04-2020, 05:23 PM)deanhystad Wrote: I get a syntax error working with you code. Is there a paste error in you post? Why does the date time pattern appear twice?
No, I tested it again. Does work with Python 3.9 and should work with older versions.
Maybe the comments are confusing the repl, if you use copy&paste.
The parse_date was made twice to show:
- How to split tasks into smaller easier tasks -> better for testing. For example, you can use this function to test each line. No file for testing needed at all.
- How to catch Exceptions, handle them, retuning a default value for sorting.
To understand the sorting part, try this:
sort(sorted([1, 2, 3, 4, "a"])) Error: TypeError Traceback (most recent call last)
<ipython-input-4-f654efc1df6a> in <module>
----> 1 sorted([1, 2, 3, 4, "a"])
TypeError: '<' not supported between instances of 'str' and 'int'
Same with datetime objects.
This works because datetime is comparable with datetime:
sorted([datetime.min, datetime.max, datetime(2020,1,1), datetime(1990,1,1)]) But this won't work:
sorted([datetime(2020,1,1), 1]) Error: TypeError: '<' not supported between instances of 'int' and 'datetime.datetime'
This is why a key-function for sorting should always return the same type.
You could remove lines with wrong formatting before you sort.
Or you have a situation, where you want still to keep the lines with wrong format and sorting them.
There is no one universal solution for all.
Keep learning the basics before you switch to pandas.
Posts: 2,125
Threads: 11
Joined: May 2017
(Nov-04-2020, 05:52 PM)beginner2020 Wrote: I got the below error, using python version 2.7.5. Will that be due to version.
date_str, _ = line.split(maxsplit=1)
TypeError: split() takes no keyword arguments
I saw it too late.
You should avoid the use of Python 2.7. It's end of life.
Python 3.6 is the oldest available version which has still support.
Python 3.5 has also reached the end of life.
That maxsplit can be a keyword-argument was introduced with Python 3.3.
Start your repl or script with python3 .
Just python without a 3 will start the Python 2.7 interpreter on the most distributions.
|