Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
where is a pattern?
#1
i have a string pattern (such as '[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9].[0-9][0-9][0-9]') that may be in another string. i'm still baffled by the re module (i guess i just can't get into a perl frame of mind). i want to know the start and end position where the matched substring is. but the .start and .end methods in a match object want an argument that makes no sense to me. who knows how to use this?

even better would be a function that can extract a date and time substring in any date and time format (even if it is ambiguous between date and month) and extract it and return the part before the date and time, the date and time substring, and the part after the date and time.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
So you need a better strptime function (string parse time function) ?

https://docs.python.org/3.9/library/date...e-behavior
Reply
#3
strptime() is a poor implementation, even in C. it can't deal well with a chance format and it can't find the date and time in a string (inside text in that string).
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#4
What does your pattern mean ?
The first part looks like a date, but the second ?

[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9].[0-9][0-9][0-9]
2019-06-12.123
Reply
#5
Skaperen Wrote:strptime() is a poor implementation, even in C. it can't deal well with a chance format and it can't find the date and time in a string (inside text in that string).
You may want to try the dateutil module's time parser.
Reply
#6
(Jun-07-2019, 06:55 AM)heiner55 Wrote: What does your pattern mean ?
The first part looks like a date, but the second ?

[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9].[0-9][0-9][0-9]
2019-06-12.123

it's part of a time. it would be 6 digits but the existence of 3 is all that is needed to be sure it's not one of the other patterns.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#7
Ok, I understood.
Reply
#8
That's pattern is way longer than it need to be,have you look into the basic regex before?
Quick test.
>>> import re
>>> 
>>> text = "foo 2019-06-12.123 bar"
>>> r = re.search(r"\d{4}-\d{2}-\d{2}\.\d{3}", text).group()
>>> r
'2019-06-12.123'
If it's a common valid date format can parse with dateutil as mention bye @Gribouillis
I like pendulum the best(and most correct) date tool that's is made for Python in the latest years.
>>> import pendulum
>>> 
>>> d = '2019-06-12'
>>> pendulum.parse(d)
DateTime(2019, 6, 12, 0, 0, 0, tzinfo=Timezone('UTC')
It will fail on auto parse with 2019-06-12.123,but can write a own with formatter.
>>> import pendulum
>>> 
>>> dt = pendulum.from_format('2019-06-12.123', 'YYYY-DD-MM.hms')
>>> dt
DateTime(2019, 12, 6, 1, 2, 3, tzinfo=Timezone('UTC'))
As you see pendulum dos this way better than strptime().
Reply
#9
i had match bugs with that pattern an re-wrote the code this afternoon using a custom pattern format just for this case. if the pattern had a '0' it tested the character with .isdecimal(), else it compared the character to the pattern character. i had about 3100 files with a name that had the date+time on the end followed by some other stuff in many cases, with the original name at the front. i wanted to transpose each file's original name and date+time to be date+time then original name. some had various time formats, just to complicate things more. some had original names with numbers in them, even dates. but it's done now.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020