Python Forum

Hello,

Does someone know how to find the "</body>" closing bit in an HTML file?

from bs4 import BeautifulSoup as bs 

soup = bs("file.html")

#How to find </body>?
element = soup.body.previous_sibling
if element  is none:
	print("Nothing")
else
	print("Found :", element )

Thank you.

You are not finding opening or closing tag. You parse the BeautifulSoup object and [maybe] find the whole tag and get instance of bs4.element.Tag

If you want to search for string </body>, then maybe regex is the tool you need, but that is NOT parsing html. Have a look at this famous answer on Stack Overflow

Thanks. Indeed, it looks like using a regex would be simpler in this case.

from bs4 import BeautifulSoup as bs

# Load the HTML file
with open("file.html", "r", encoding="utf-8") as file:
html_data = file.read()

# Create a BeautifulSoup object
soup = bs(html_data, "html.parser")

# Find the </body> tag
body_closing_tag = soup.find_all(text="</body>")

if not body_closing_tag:
print("Nothing")
else:
print("Found:", body_closing_tag[0].parent)

1.> Open the HTML file in read mode and read its contents into the html_data variable.
2.> Create a BeautifulSoup object named soup to parse the HTML data.
3.> Use soup.find_all(text="</body>") to find all occurrences of </body> in the parsed HTML.
4.> If we find any occurrences, we print the parent of the first occurrence to get the whole <body> tag. If we don't find any occurrences, we print "Nothing."

Winfried

buran

Winfried

Gaurav_Kumar