That link you posted @nilmao is for not using regex with XML/HTML.
lxml is an XML and HTML parser.
lxml is an XML and HTML parser.
lxml Wrote:lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language.