Python Forum
Thread Rating:
  • 1 Vote(s) - 1 Average
  • 1
  • 2
  • 3
  • 4
  • 5
rewrite_title
#11
how do i group the title tags in regular expression....
I need the same same title tags as per the requirement

If regular expressions are grouped something like this; how do i just replace part 2 and keep the rest of the regular expression as is...

r'(<title>)(.+?)(</title>)'
  \1          \2      \3

My answer should be like this...
blah<title>Bar title</title>second title<title>Bar title</title>blah

The function must:
  1. Return a new string, where the content found between <title> and
  </title> (*case insensitive*) are replaced with the function's TITLE argument.
  All other text should remain unchanged, including the <title> opening and
  closing tags. All occurances of <title> tags should be rewritten, if there is
  more than 1.
  2. The function documentation should read:
    Replace the HTML title contents with the given TITLE

def rewrite_title(hb,nt):
    """ Replace the HTML title contents with the given TITLE """
    rhb = re.sub(r'<title>(.+?)</title>', nt,0 re.IGNORECASE)
    print rhb
hb = 'blah<title>{title}</title>second title<title>{newtitle}</title>blah'
nt = 'Bar title'
print rewrite_title(hb,nt)
Reply
#12
It's bad that you have a task where you most use regex with html.
There are parser for this you should learn to use instead.
Eg.
from bs4 import BeautifulSoup

html = '<html><head><title>new title</title></head></html>'
soup = BeautifulSoup(html, 'html.parser')
find_tile = soup.find('title')
find_tile.string = 'hello world'
print(soup.prettify())
Output:
<html>  <head>   <title>    hello world   </title>  </head> </html>
Reply
#13
here is the solution...

import re

def rewrite_title(hb,nt):
    """ Replace the HTML title contents with the given TITLE """
    return re.sub(r'(<title>)(.*?)(</title>)',r'\1'+nt+r'\3',hb,0, re.IGNORECASE)
#hb = 'blah<TITLE>{title}</TiTle>halb'
#nt = 'Bar title'
#rewrite_title(hb,nt)
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020