[SOLVED] [BeautifulSoup] How to get this text?

Winfried · (This post was last modified: Aug-17-2022, 01:56 PM by Winfried.)

Hello,

Can BeautifulSoup grab what's between the brackets ("John Doe") in the following line?

from bs4 import BeautifulSoup

with open("input.txt") as fp:
    soup = BeautifulSoup(fp, 'html.parser')

#<a href="/authorx/john_doe">John Doe</a>

Thank you.

--
Edit: Found it

items = soup.select("a[href*=authorx]")
for item in items:
	#print(item)
	print(item.string)

***snippsat*** · Aug-17-2022, 02:03 PM

There is no need to loop to get the text.

from bs4 import BeautifulSoup

html = '<a href="/authorx/john_doe">John Doe</a>'
soup = BeautifulSoup(html, 'html.parser')

>>> item = soup.find('a')
>>> item.text
'John Doe'

Or if use CSS selector then can use select_one() if only need this element.

>>> item = soup.select_one("a[href*=authorx]")
>>> item.text
'John Doe'

Winfried · (This post was last modified: Aug-17-2022, 02:20 PM by Winfried.)

Even if a book has more than one author, and the page has a bunch of href links that have nothing to do with the authors?

***snippsat*** · Aug-17-2022, 02:50 PM

(Aug-17-2022, 02:19 PM)Winfried Wrote: Even if a book has more than one author, and the page has a bunch of href links that have nothing to do with the authors?

Give a example if you have trouble.
There are serval to get a tag even if there are serval similar.

from bs4 import BeautifulSoup

html = '''\
<body>
  <a href="/authorx/john_doe">John Doe1</a>
  <a href="/authorx/john_doe">John Doe2</a>
  <a href="/authorx/john_doe">John Doe3</a>
</body>'''

soup = BeautifulSoup(html, 'html.parser')

>>> item = soup.select_one('body > a:nth-child(2)')
>>> item.text
'John Doe2'

Winfried · Aug-17-2022, 03:06 PM

The code above works fine, so I'm happy with it.

However, what about this?

<div class="bi_row">
<span class="bi_col_title">Publication date</span>
<span class="bi_col_value">January 1, 1999</span>
</div>

How could I get the publication date ("January 1, 1999") ?

***snippsat*** · Aug-17-2022, 03:30 PM

from bs4 import BeautifulSoup

html = '''\
<div class="bi_row">
  <span class="bi_col_title">Publication date</span>
  <span class="bi_col_value">January 1, 1999</span>
</div>'''

soup = BeautifulSoup(html, 'html.parser')

# CSS selector
>>> item = soup.select_one("span.bi_col_value")
>>> item.text
'January 1, 1999'

# Using find() add a singel _ in CSS class
>>> item = soup.find(class_="bi_col_value")
>>> item.text
'January 1, 1999'

# Get attribute would be like
>>> item.attrs
{'class': ['bi_col_value']}
>>> item.get('class')
['bi_col_value']

Winfried · (This post was last modified: Aug-17-2022, 03:58 PM by Winfried.)

Sorry, forgot to say the webpage contains multiple items with identical elements:

<div class="bi_row">
<span class="bi_col_title">Publisher</span>
<span class="bi_col_value">Some publisher Inc</span>
</div>

<div class="bi_row">
<span class="bi_col_title">Publication date</span>
<span class="bi_col_value">January 1, 1999</span>
</div>

etc.

--
Edit: Kludgy but it works:

for col in soup.find_all("div", {"class": "bi_row"}):
	if col.find("span", {"class": "bi_col_title"}).text == "Publication date":
		print(col.find("span", {"class": "bi_col_value"}).text)
		break

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	[SOLVED] [BeautifulSoup] Why are some elements ignored?	Winfried	5	1,722	Sep-04-2024, 09:34 AM Last Post: Winfried
	[SOLVED] [BeautifulSoup] Why attribute not found?	Winfried	0	1,285	Mar-11-2023, 10:00 PM Last Post: Winfried
	[SOLVED] [BeautifulSoup] Why does it turn inserted string's brackets into </>?	Winfried	0	2,714	Sep-03-2022, 11:21 PM Last Post: Winfried
	[SOLVED] [Beautifulsoup] Find if element exists, and edit/append?	Winfried	2	7,074	Sep-03-2022, 10:14 PM Last Post: Winfried
	[SOLVED] [BeautifulSoup] Turn select() into comma-separated string?	Winfried	0	1,926	Aug-19-2022, 08:07 PM Last Post: Winfried
	Delete empty text files [SOLVED]	AlphaInc	5	3,208	Jul-09-2022, 02:15 PM Last Post: DeaD_EyE
	[SOLVED] [ElementTree] Grab text in attributes?	Winfried	3	2,563	May-27-2022, 04:59 PM Last Post: Winfried
	[SOLVED] Read text file from some point till EOF?	Winfried	1	3,595	Oct-10-2021, 10:29 PM Last Post: Winfried
	Sorting and Merging text-files [SOLVED]	AlphaInc	10	7,991	Aug-20-2021, 05:42 PM Last Post: snippsat
	[SOLVED] Find last occurence of pattern in text file?	Winfried	4	6,654	Aug-13-2021, 08:21 PM Last Post: Winfried

[SOLVED] [BeautifulSoup] How to get this text?

User Panel Messages

Announcements