(Aug-26-2020, 06:26 AM)millpond Wrote: With regex, at least the PCRE I am used to \. is always a period, and . means 'any character'.
The regex in python is the same here. "." is any character, and "\." is only the period.
>>> re.sub(".", "X", "hi.") 'XXX' >>> re.sub("\.", "X", "hi.") 'hiX'
Quote:In python it seems r'.' means any character.
Either
'.'
or r'.'
is a method of creating a string with a single period. The regex engine on receiving it interprets it as "any character". Quote:Lets do regex strings:
a = "fee...fi....fo.....fum"
b = re.sub("\.","\-",a)
Perl had a rule that all valid backslash sequences in the regex engine (like
\n
being a newline character) were alphabetic characters. Therefore, you could always add a backslash before a symbol and it would be interpreted as just the raw symbol. Both -
and \-
would be interpreted as a dash (when outside of character set context). Python doesn't have that rule. As
\-
isn't a valid escape sequence, it's interpreted as both characters during the replacement. From re.subDOCUMENTATION Wrote:repl can be a string or a function; if it is a string, any backslash escapes in it are processed. That is,\n
is converted to a single newline character,\r
is converted to a carriage return, and so forth. Unknown escapes of ASCII letters are reserved for future use and treated as errors. Other unknown escapes such as\&
are left alone.
Quote:b = re.sub(r'.',r'_',a)
-> _____________________
The entire string is wiped out.
r'.' is NOT a raw character, at least with the re class.
There's no such thing as a raw character. A "r-string" formulation does only one thing. It stops python from interpreting the backslash before handing it off to the regex engine. As "." has no backslashes, there is no difference between
"."
and r"."
. The "r-string" doesn't have anything to do with the regex engine. It's just setting a slightly different rule for how python strings are constructed. >>> print("-->\t<--") #This string has a tab character inside. --> <-- >>> print(r"-->\t<--") #This string has the two character sequence of a backslash and a letter t inside. -->\t<--
Quote:And escaping '-' (\-)is not working as expected in regex mode.
-
doesn't need escaping. It has no special meaning in a replacement string. It has no special meaning in a regex outside a character class, and you can't supply a character class in a replacement. Both python and perl will behave the same when -
is used there, but only perl lets you also use \-
Quote:In Perl I would typically use something like:
x =~ s/^.*(fee).+(fi).+(fo).+(fum).*$/$1,$3,$2,$4/ -> ...fee...fo...fi...fum
Though re apparently uses \1 instead of the ancient \$1 format.
Seems about the same in python (although neither perl nor python will print the periods in the replaced string).
>>> a = "fee...fi....fo.....fum" >>> re.sub("^.*(fee).+(fi).+(fo).+(fum).*$", r"\1,\3,\2,\4", a) 'fee,fo,fi,fum'