Python Forum

Full Version: <b> followed by <b> before closing</b>
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am trying to edit some HTML text in Python. I have an HTML file where there are sometimes a <b> tag (Bold) and before it is closed with a </b> there is another <b>, eg:

<RF><b>1:4 <b>Hom wat is ... kom:</b>
he second <b> should not be there.

Is it possible to write a regex pattern to find such occurrences and to delete the spurious <b>?
something like that

import re

s = "<RF><b>1:4 <b>Hom wat is ... kom:</b>"

def repl(match, count=[0]):
    x, = count
    count[0] += 1
    if x > 0:
        return ''
    return '<b>'


print(re.sub('<b>', repl, s))
Output:
<RF><b>1:4 Hom wat is ... kom:</b>