Python Forum

Greetings!
I’d like to match strings in files, it seems simple but I’m failing to do this…
It has multiple white spaces before the word Start Time or End Time and the Time and Date of the event

String “                                                                    Start Time  2/28/2024 8:34:34 AM ”

I tried :

  
if re.search("\s+\Start\s\Time",el) : #  < ------------- el is a line from the file ,,,
    print(f" START LN {el}")

And got an error message “ bad escape \T at position 11”
Then I tried:

if re.search("\s+\Start\s\",el) : #  < ------------- el is a line from the file ,,,
    print(f" START LN {el}")

This one prints tons of other lines I do not care about. Confused

I was sure by using “\s+” would filter the line I wanted but it does not.
Would you help me with this?

Thank you.

Quote:
if re.search("\s+\Start\s\",el) : #  < ------------- el is a line from the file ,,,
    print(f" START LN {el}")
This one prints tons of other lines I do not care about. Confused

No, there is a snytax error that would prevent the program from running. You cannot have a single backslash at the end of a string literal.

You need to protect against "\" being interpreted as the start of an escape sequence. I would use raw strings.

I don't think you fully understand what \ does in a regex pattern. Why are you using \Start in your pattern? \S is "match any non-whitespace character". \T doesn't have a special meaning in a re pattern. That's why you got an error.

Quote:I was sure by using “\s+” would filter the line I wanted but it does not.

Putting \s+ at the start of the pattern just forces you to have one whitespace character before Start Time. To ignore lines that contain your pattern as well as other text, include the start (^) and end ($) of string in your pattern. You might want to use "match" instead of "search". Match looks for the entire string to match the pattern. Search is happy if it finds your pattern anywhere in the string.

This might work:

with open("test.txt", "r") as file:
    for index, line in enumerate(file):
        result = re.match(r"\s*?(Start Time.*?[A|PM])\s*?$", line)
        if result:
            print(f"{index:3}: ({result.start()}, {result.end()}) {result.groups()[0]}")

Or you could just strip all the leading and starting whitespace and assume any line that starts with "Start Time" is a line you are looking for.

Always use RAW strings when you create patterns for the re module, for example

if re.search(r"^\s+\Start\s\Time",el)  # <-- note the r" syntax

By the way I don't think \T is allowed in the re syntax, and \S matches other characters than S alone.

If you want to be really picky.

with open("test.txt", "r") as file:
    for line in file:
        result = re.match(r"\s*(Start Time {1,2}(\d{1,2}/\d{1,2}/\d{4}) (\d{1,2}:\d{1,2}:\d{1,2} [AP]M))\s*$", line)
        if result:
            print(result.groups())

match() forces pattern to start at the start of line.
r"" makes the pattern a raw string. Don't have to worry about escape sequences.
\s* matches any number of whitespace characters.
() creates groups. This pattern has a group for the "Start Time...PM" part, the date part and the time part.
Start Time matches Start Time.
{1,2} matches one or two spaces.
\d{1,2} matches 1 or 2 digits.
/ matches /.
: matches :.
[AP]M matches AM or PM.
\s*$ matches whitespace up to the end of the line.

You guys are great! That is what I looking for..get some code and the explanation... Smile

(Mar-04-2024, 09:07 PM)tester_V Wrote: [ -> ]Greetings!
I’d like to match strings in files, it seems simple but I’m failing to do this…
It has multiple white spaces before the word Start Time or End Time and the Time and Date of the event
String “                                                                    Start Time  2/28/2024 8:34:34 AM ”
I tried :
  
if re.search("\s+\Start\s\Time",el) : #  < ------------- el is a line from the file ,,,
    print(f" START LN {el}")
And got an error message “ bad escape \T at position 11”
Then I tried:
if re.search("\s+\Start\s\",el) : #  < ------------- el is a line from the file ,,,
    print(f" START LN {el}")
This one prints tons of other lines I do not care about.
I was sure by using “\s+” would filter the line I wanted but it does not.
Would you help me with this?

Thank you.

It seems like you're encountering issues with your regular expression syntax. Here's how you can correct it:

import re

# Sample line from the file
el = "                                                                    Start Time  2/28/2024 8:34:34 AM"

# Use raw string literal to avoid escaping issues
if re.search(r"\s+Start\s+Time", el):
    print(f"START LN {el}")

In this corrected version:

I used a raw string literal (r"...") for the regular expression to avoid issues with backslashes.
I adjusted the regular expression to \s+Start\s+Time, which matches one or more whitespace characters before and after "Start Time".
i hope This should correctly filter the lines containing "Start Time" as you intended.

Best Regard
Danish Hafeez | QA Assistant

tester_V

deanhystad

Gribouillis

deanhystad

tester_V

Danishhafeez