![]() |
[BeautifulSoup] Find </body>? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: [BeautifulSoup] Find </body>? (/thread-39985.html) |
[BeautifulSoup] Find </body>? - Winfried - May-15-2023 Hello, Does someone know how to find the "</body>" closing bit in an HTML file? from bs4 import BeautifulSoup as bs soup = bs("file.html") #How to find </body>? element = soup.body.previous_sibling if element is none: print("Nothing") else print("Found :", element )Thank you. RE: [BeautifulSoup] Find </body>? - buran - May-15-2023 You are not finding opening or closing tag. You parse the BeautifulSoup object and [maybe] find the whole tag and get instance of bs4.element.Tag If you want to search for string </body> , then maybe regex is the tool you need, but that is NOT parsing html. Have a look at this famous answer on Stack Overflow
RE: [BeautifulSoup] Find </body>? - Winfried - May-15-2023 Thanks. Indeed, it looks like using a regex would be simpler in this case. RE: [BeautifulSoup] Find </body>? - Gaurav_Kumar - Jul-21-2023 from bs4 import BeautifulSoup as bs # Load the HTML file with open("file.html", "r", encoding="utf-8") as file: html_data = file.read() # Create a BeautifulSoup object soup = bs(html_data, "html.parser") # Find the </body> tag body_closing_tag = soup.find_all(text="</body>") if not body_closing_tag: print("Nothing") else: print("Found:", body_closing_tag[0].parent) 1.> Open the HTML file in read mode and read its contents into the html_data variable. 2.> Create a BeautifulSoup object named soup to parse the HTML data. 3.> Use soup.find_all(text="</body>") to find all occurrences of </body> in the parsed HTML. 4.> If we find any occurrences, we print the parent of the first occurrence to get the whole <body> tag. If we don't find any occurrences, we print "Nothing." |