Python Forum

I am trying to edit some HTML text in Python. I have an HTML file where there are sometimes a tag (Bold) and before it is closed with a there is another , eg:

<RF>1:4 Hom wat is ... kom:
he second should not be there.

Is it possible to write a regex pattern to find such occurrences and to delete the spurious ?

something like that

import re

s = "<RF><b>1:4 <b>Hom wat is ... kom:</b>"

def repl(match, count=[0]):
    x, = count
    count[0] += 1
    if x > 0:
        return ''
    return '<b>'


print(re.sub('<b>', repl, s))

Output:
<RF><b>1:4 Hom wat is ... kom:</b>

WJSwan

Axel_Erfurt