Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[BeautifulSoup] Find </body>?
#1
Hello,

Does someone know how to find the "</body>" closing bit in an HTML file?

from bs4 import BeautifulSoup as bs 

soup = bs("file.html")

#How to find </body>?
element = soup.body.previous_sibling
if element  is none:
	print("Nothing")
else
	print("Found :", element )
Thank you.
Reply
#2
You are not finding opening or closing tag. You parse the BeautifulSoup object and [maybe] find the whole tag and get instance of bs4.element.Tag

If you want to search for string </body>, then maybe regex is the tool you need, but that is NOT parsing html. Have a look at this famous answer on Stack Overflow
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
Thanks. Indeed, it looks like using a regex would be simpler in this case.
Reply
#4
from bs4 import BeautifulSoup as bs

# Load the HTML file
with open("file.html", "r", encoding="utf-8") as file:
html_data = file.read()

# Create a BeautifulSoup object
soup = bs(html_data, "html.parser")

# Find the </body> tag
body_closing_tag = soup.find_all(text="</body>")

if not body_closing_tag:
print("Nothing")
else:
print("Found:", body_closing_tag[0].parent)


1.> Open the HTML file in read mode and read its contents into the html_data variable.
2.> Create a BeautifulSoup object named soup to parse the HTML data.
3.> Use soup.find_all(text="</body>") to find all occurrences of </body> in the parsed HTML.
4.> If we find any occurrences, we print the parent of the first occurrence to get the whole <body> tag. If we don't find any occurrences, we print "Nothing."
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  find a hyperlink in Gmail body python 3(imap and selenium) taomihiranga 1 8,196 Dec-30-2020, 05:31 PM
Last Post: Gamer1057
  Get html body of URL rama27 6 3,453 Aug-03-2020, 02:37 PM
Last Post: snippsat
  Why doesn't my spider find body text? sigalizer 5 4,349 Oct-30-2019, 11:35 PM
Last Post: sigalizer
  Is it possible to perform a PUT request by passing a req body instead of an ID ary 0 1,820 Feb-20-2019, 05:55 AM
Last Post: ary
  In CSV, how to write the header after writing the body? Tim 18 14,630 Jan-06-2018, 01:54 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020