Python Forum

Full Version: Do regular expressions still need raw strings?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello everyone (1st post!)

When I started using regular expressions back in Python 2, it seemed that best practice was to use raw strings in order to avoid having escape characters all over the place.

But recently, I've done a few patterns that I thought would need "escaped escapes" and the like, but they seemed to work fine as normal text strings.

Has the regular expression syntax been streamlined, or did I just not make a regex that needed a raw string?
(sorry, no, I don't remember the specific pattern at the moment.)
(May-02-2024, 03:43 PM)bobmon Wrote: [ -> ]Has the regular expression syntax been streamlined, or did I just not make a regex that needed a raw string?
Not much has changed. You made a regex that did not need a raw string.

Now let's see this from a higher point of vue: Regexes in Python need to be written in the language of Python regexes, which is described in the re module. The problem is that this language uses the backslash character as a special character, and it turns out that the backslash character is ALSO used as a special character in Python's LITERAL strings.

The consequence is that when you write a regex in a literal string, a backslash character from the regex language needs to be escaped when there is an ambiguity. For example the regex \b which maches the beginning of a word must be written in a literal string as "\\b" or r"\b" because \b in an ordinary literal string means an ASCII-backspace character, which is quite different.

So strictly speaking, regexes DONT need raw strings but raw strings in this context are a handy tool to avoid incorrect interpretation of literal strings. For your safety, use raw strings to write literal regexes.

Can you match an ascii-backspace in a regex written as a raw literal string? Use \x08
>>> import re
>>> r = re.compile(r'spa\x08m')
>>> r.search('hello spa\bm')
<re.Match object; span=(6, 11), match='spa\x08m'>
>>> 
Hmm. Okay, thanks!
Yes, you can match an ASCII backspace character (\x08) in a regular expression written as a raw literal string. Here's how you can do it:

import re

# Define the regular expression pattern using a raw string
pattern = r'spa\x08m'

# Compile the regular expression
regex = re.compile(pattern)

# Search for the pattern in the input string
match = regex.search('hello spa\bm')

# Print the match
print(match)

This code will output:

<re.Match object; span=(6, 11), match='spa\x08m'>