Python Forum
[BeautifulSoup] Find </body>? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: [BeautifulSoup] Find </body>? (/thread-39985.html)



[BeautifulSoup] Find </body>? - Winfried - May-15-2023

Hello,

Does someone know how to find the "</body>" closing bit in an HTML file?

from bs4 import BeautifulSoup as bs 

soup = bs("file.html")

#How to find </body>?
element = soup.body.previous_sibling
if element  is none:
	print("Nothing")
else
	print("Found :", element )
Thank you.


RE: [BeautifulSoup] Find </body>? - buran - May-15-2023

You are not finding opening or closing tag. You parse the BeautifulSoup object and [maybe] find the whole tag and get instance of bs4.element.Tag

If you want to search for string </body>, then maybe regex is the tool you need, but that is NOT parsing html. Have a look at this famous answer on Stack Overflow


RE: [BeautifulSoup] Find </body>? - Winfried - May-15-2023

Thanks. Indeed, it looks like using a regex would be simpler in this case.


RE: [BeautifulSoup] Find </body>? - Gaurav_Kumar - Jul-21-2023

from bs4 import BeautifulSoup as bs

# Load the HTML file
with open("file.html", "r", encoding="utf-8") as file:
html_data = file.read()

# Create a BeautifulSoup object
soup = bs(html_data, "html.parser")

# Find the </body> tag
body_closing_tag = soup.find_all(text="</body>")

if not body_closing_tag:
print("Nothing")
else:
print("Found:", body_closing_tag[0].parent)


1.> Open the HTML file in read mode and read its contents into the html_data variable.
2.> Create a BeautifulSoup object named soup to parse the HTML data.
3.> Use soup.find_all(text="</body>") to find all occurrences of </body> in the parsed HTML.
4.> If we find any occurrences, we print the parent of the first occurrence to get the whole <body> tag. If we don't find any occurrences, we print "Nothing."