Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
File Name Parsing
#6
(Aug-26-2020, 06:26 AM)millpond Wrote: With regex, at least the PCRE I am used to \. is always a period, and . means 'any character'.

The regex in python is the same here. "." is any character, and "\." is only the period.
>>> re.sub(".", "X", "hi.")
'XXX'
>>> re.sub("\.", "X", "hi.")
'hiX'
Quote:In python it seems r'.' means any character.

Either '.' or r'.' is a method of creating a string with a single period. The regex engine on receiving it interprets it as "any character".

Quote:Lets do regex strings:
a = "fee...fi....fo.....fum"
b = re.sub("\.","\-",a)

Perl had a rule that all valid backslash sequences in the regex engine (like \n being a newline character) were alphabetic characters. Therefore, you could always add a backslash before a symbol and it would be interpreted as just the raw symbol. Both - and \- would be interpreted as a dash (when outside of character set context).

Python doesn't have that rule. As \- isn't a valid escape sequence, it's interpreted as both characters during the replacement. From re.sub

DOCUMENTATION Wrote:repl can be a string or a function; if it is a string, any backslash escapes in it are processed. That is, \n is converted to a single newline character, \r is converted to a carriage return, and so forth. Unknown escapes of ASCII letters are reserved for future use and treated as errors. Other unknown escapes such as \& are left alone.


Quote:b = re.sub(r'.',r'_',a)
-> _____________________
The entire string is wiped out.


r'.' is NOT a raw character, at least with the re class.


There's no such thing as a raw character. A "r-string" formulation does only one thing. It stops python from interpreting the backslash before handing it off to the regex engine. As "." has no backslashes, there is no difference between "." and r".". The "r-string" doesn't have anything to do with the regex engine. It's just setting a slightly different rule for how python strings are constructed.

>>> print("-->\t<--")  #This string has a tab character inside.
-->	<--
>>> print(r"-->\t<--") #This string has the two character sequence of a backslash and a letter t inside.
-->\t<--
Quote:And escaping '-' (\-)is not working as expected in regex mode.

- doesn't need escaping. It has no special meaning in a replacement string. It has no special meaning in a regex outside a character class, and you can't supply a character class in a replacement. Both python and perl will behave the same when - is used there, but only perl lets you also use \-

Quote:In Perl I would typically use something like:
x =~ s/^.*(fee).+(fi).+(fo).+(fum).*$/$1,$3,$2,$4/ -> ...fee...fo...fi...fum


Though re apparently uses \1 instead of the ancient \$1 format.

Seems about the same in python (although neither perl nor python will print the periods in the replaced string).
>>> a = "fee...fi....fo.....fum"
>>> re.sub("^.*(fee).+(fi).+(fo).+(fum).*$", r"\1,\3,\2,\4", a)
'fee,fo,fi,fum'
Reply


Messages In This Thread
File Name Parsing - by millpond - Aug-25-2020, 07:28 AM
RE: File Name Parsing - by ndc85430 - Aug-25-2020, 07:32 AM
RE: File Name Parsing - by millpond - Aug-26-2020, 04:57 AM
RE: File Name Parsing - by bowlofred - Aug-25-2020, 08:41 AM
RE: File Name Parsing - by millpond - Aug-26-2020, 06:26 AM
RE: File Name Parsing - by bowlofred - Aug-26-2020, 08:04 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
Video doing data treatment on a file import-parsing a variable EmBeck87 15 3,023 Apr-17-2023, 06:54 PM
Last Post: EmBeck87
  Modify values in XML file by data from text file (without parsing) Paqqno 2 1,746 Apr-13-2022, 06:02 AM
Last Post: Paqqno
  Parsing xml file deletes whitespaces. How to avoid it? Paqqno 0 1,067 Apr-01-2022, 10:20 PM
Last Post: Paqqno
  Parsing a syslog file ebolisa 11 4,258 Oct-10-2021, 05:15 PM
Last Post: snippsat
Thumbs Up Parsing a YAML file without changing the string content..?, Flask - solved. SpongeB0B 2 2,323 Aug-05-2021, 08:02 AM
Last Post: SpongeB0B
  Error while parsing tables from docx file aditi 1 3,785 Jul-14-2020, 09:24 PM
Last Post: aditi
  help parsing file aslezak 2 2,270 Oct-22-2019, 03:51 PM
Last Post: aslezak
  Python Script for parsing dictionary values from yaml file pawan6782 3 4,985 Sep-04-2019, 07:21 PM
Last Post: pawan6782
  Parsing an MBOX file Oliver 1 8,228 May-26-2019, 07:12 AM
Last Post: heiner55
  parsing complex text file anna 1 2,106 Apr-10-2019, 09:54 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020