![]() |
[SOLVED] [BeautifulSoup] How to get this text? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: [SOLVED] [BeautifulSoup] How to get this text? (/thread-37982.html) |
[SOLVED] [BeautifulSoup] How to get this text? - Winfried - Aug-17-2022 Hello, Can BeautifulSoup grab what's between the brackets ("John Doe") in the following line? from bs4 import BeautifulSoup with open("input.txt") as fp: soup = BeautifulSoup(fp, 'html.parser') #<a href="/authorx/john_doe">John Doe</a>Thank you. -- Edit: Found it items = soup.select("a[href*=authorx]") for item in items: #print(item) print(item.string) RE: [SOLVED] [BeautifulSoup] How to get this text? - snippsat - Aug-17-2022 There is no need to loop to get the text. from bs4 import BeautifulSoup html = '<a href="/authorx/john_doe">John Doe</a>' soup = BeautifulSoup(html, 'html.parser') >>> item = soup.find('a') >>> item.text 'John Doe'Or if use CSS selector then can use select_one() if only need this element.>>> item = soup.select_one("a[href*=authorx]") >>> item.text 'John Doe' RE: [SOLVED] [BeautifulSoup] How to get this text? - Winfried - Aug-17-2022 Even if a book has more than one author, and the page has a bunch of href links that have nothing to do with the authors? RE: [SOLVED] [BeautifulSoup] How to get this text? - snippsat - Aug-17-2022 (Aug-17-2022, 02:19 PM)Winfried Wrote: Even if a book has more than one author, and the page has a bunch of href links that have nothing to do with the authors?Give a example if you have trouble. There are serval to get a tag even if there are serval similar. from bs4 import BeautifulSoup html = '''\ <body> <a href="/authorx/john_doe">John Doe1</a> <a href="/authorx/john_doe">John Doe2</a> <a href="/authorx/john_doe">John Doe3</a> </body>''' soup = BeautifulSoup(html, 'html.parser') >>> item = soup.select_one('body > a:nth-child(2)') >>> item.text 'John Doe2' RE: [SOLVED] [BeautifulSoup] How to get this text? - Winfried - Aug-17-2022 The code above works fine, so I'm happy with it. However, what about this? <div class="bi_row"> <span class="bi_col_title">Publication date</span> <span class="bi_col_value">January 1, 1999</span> </div> How could I get the publication date ("January 1, 1999") ? RE: [SOLVED] [BeautifulSoup] How to get this text? - snippsat - Aug-17-2022 from bs4 import BeautifulSoup html = '''\ <div class="bi_row"> <span class="bi_col_title">Publication date</span> <span class="bi_col_value">January 1, 1999</span> </div>''' soup = BeautifulSoup(html, 'html.parser') # CSS selector >>> item = soup.select_one("span.bi_col_value") >>> item.text 'January 1, 1999' # Using find() add a singel _ in CSS class >>> item = soup.find(class_="bi_col_value") >>> item.text 'January 1, 1999' # Get attribute would be like >>> item.attrs {'class': ['bi_col_value']} >>> item.get('class') ['bi_col_value'] RE: [SOLVED] [BeautifulSoup] How to get this text? - Winfried - Aug-17-2022 Sorry, forgot to say the webpage contains multiple items with identical elements: <div class="bi_row"> <span class="bi_col_title">Publisher</span> <span class="bi_col_value">Some publisher Inc</span> </div> <div class="bi_row"> <span class="bi_col_title">Publication date</span> <span class="bi_col_value">January 1, 1999</span> </div> etc. -- Edit: Kludgy but it works: for col in soup.find_all("div", {"class": "bi_row"}): if col.find("span", {"class": "bi_col_title"}).text == "Publication date": print(col.find("span", {"class": "bi_col_value"}).text) break |