Python Forum
Python re.sub text manipulation on matched contents before substituting
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Python re.sub text manipulation on matched contents before substituting
#1
Hi, I have the following code below:

import re

t = '<img data src="some-thing">'
pattern = '(<img ).*?(src=")(.*?)(">)'
u = re.sub(pattern, '\\1\\2\\3\\4', t)
print(u)
The output is <img src="some-thing">. Is there a way to do some text manipulation on group 3 so it ends up being <img src="something">? I can't think of a better approach right now. This is a just a basic example of what I am trying to do, the content I am replacing is more complex that just replacing a dash. Thanks.

Found solution with a function: https://docs.python.org/3/library/re.html#text-munging
Reply
#2
If you just want to replace the dash in the string after you've captured it, use string replace.

>>> "some-thing".replace("-","")
'something'
But it sounds like you want to modify data inside an HTML document, retaining the document. Trying to do that with regular expressions is tedious, and likely to break as soon as the html gets a bit wonky. I'd use an HTML parser instead. It's a bit more overhead than doing a teeny regex, but it's much more reliable and flexible. Here I used beautifulsoup4.

import bs4

t = '<img data src="some-thing">'
soup = bs4.BeautifulSoup(t, features="html.parser")
soup.find('img')['src'] = soup.find('img')['src'].replace('-','')
print(soup)
Output:
Before -> <img data src="some-thing"> After -> <img data="" src="something"/>
Reply
#3
(May-16-2020, 05:04 AM)bowlofred Wrote: But it sounds like you want to modify data inside an HTML document, retaining the document. Trying to do that with regular expressions is tedious, and likely to break as soon as the html gets a bit wonky. I'd use an HTML parser instead. It's a bit more overhead than doing a teeny regex, but it's much more reliable and flexible. Here I used beautifulsoup4.

Thanks. I have run across beautiful soup when learning Python. I'm currently using selenium with xpath selectors. Maybe I'll look into in the future. But right now the substitution I'm doing is a bit more complex than just finding img src attributes and changing them. I think beautifulsoup can do it, but I'd have to spend time to rewrite everything Big Grin
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  failing to print not matched lines from second file tester_V 14 5,946 Apr-05-2022, 11:56 AM
Last Post: codinglearner
  Text File Manipulation Queries? JustJeff 2 2,077 Apr-10-2021, 08:12 PM
Last Post: JustJeff
  print only last matched line tester_V 24 6,321 Apr-30-2020, 05:16 AM
Last Post: deanhystad
  Copy same doubled matched words kozaizsvemira 2 1,998 Oct-22-2019, 08:30 AM
Last Post: kozaizsvemira
  Python requests writes the name of the file instead of contents to web page bluethundr 1 2,125 Jun-05-2019, 09:35 PM
Last Post: Larz60+
  Creation of Dynamic HTML by substituting Database values Sandy777 1 2,103 Apr-18-2019, 07:17 AM
Last Post: buran
  Python csv compare two file, update value if two value is matched kinojom 1 2,514 Apr-17-2019, 10:36 AM
Last Post: DeaD_EyE
  including big file contents in python source Skaperen 2 2,589 Nov-07-2018, 09:39 PM
Last Post: Skaperen
  [split] Python Pillow - Photo Manipulation keegan_010 1 2,941 Oct-11-2018, 09:57 AM
Last Post: Larz60+
  Python Pillow - Photo Manipulation keegan_010 2 2,870 Oct-11-2018, 03:49 AM
Last Post: keegan_010

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020