Getting a specific text inside an html with soup

***snippsat*** · Jul-09-2019, 11:51 AM

Quote:Thanks snippsat. Although it seems to be selected the 'span' inside the first 'a' only.

It should select all,quick test with code you posted.

from bs4 import BeautifulSoup

html = '''\
<div class="o-teaser o-teaser--article o-teaser--small o-teaser--has-image js-teaser" data-id="3bbb6fec-88c5-11e9-a028-86cea8523dc2">
<div class="o-teaser__content">
<div class="o-teaser__meta">
<div class="o-teaser__meta-tag">
<a class="o-teaser__tag" data-trackable="teaser-tag" href="/stream/254cd19f-4724-4c89-9230-926e8201a823">Huawei Technologies Co Ltd</a>
</div>
</div>
<div class="o-teaser__heading">
<a class="js-teaser-heading-link" data-trackable="heading-link" href="/content/3bbb6fec-88c5-11e9-a028-86cea8523dc2">
<span>
<mark class="search-item__highlight">Google</mark> warns of US national security risks from Huawei ban
</span>
</a>
</div>
<p class="o-teaser__standfirst">
<a class="js-teaser-standfirst-link" data-trackable="standfirst-link" href="/content/3bbb6fec-88c5-11e9-a028-86cea8523dc2" tabindex="-1">
<span>
...
<mark class="search-item__highlight">Google</mark> has warned the Trump administration it risks compromising US national security if it pushes ahead with sweeping export restrictions on Huawei, as the technology group seeks to continue doing...
</span>
</a></p><div class="o-teaser__timestamp">
<time class="o-teaser__timestamp-date" datetime="2019-06-07T03:36:51+0000">June 7, 2019</time>'''

soup = BeautifulSoup(html, 'lxml')

Test:

>>> span_tag = soup.select('a span')
>>> len(span_tag)
2

>>> span_tag[0]
<span>
<mark class="search-item__highlight">Google</mark> warns of US national security risks from Huawei ban
</span>

>>> for tag in span_tag:
...     print(tag.text.strip())
...     
Google warns of US national security risks from Huawei ban
...
Google has warned the Trump administration it risks compromising US national security if it pushes ahead with sweeping export restrictions on Huawei, as the technology group seeks to continue doing...

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Soup('A')	new_coder_231013	6	2,622	Aug-12-2023, 10:55 AM Last Post: Pubfonts
	Python Obstacles \| Karate \| HTML/Scrape Specific Tag and Store it in MariaDB	BrandonKastning	8	3,229	Nov-22-2021, 01:38 AM Last Post: BrandonKastning
	How to get specific TD text via Selenium?	euras	3	8,895	May-14-2021, 05:12 PM Last Post: snippsat
	HTML multi select HTML listbox with Flask/Python	rfeyer	0	4,695	Mar-14-2021, 12:23 PM Last Post: rfeyer
	Any way to remove HTML tags from scraped data? (I want text only)	SeBz2020uk	1	3,511	Nov-02-2020, 08:12 PM Last Post: Larz60+
	Help: Beautiful Soup - Parsing HTML table	ironfelix717	2	2,733	Oct-01-2020, 02:19 PM Last Post: snippsat
	Beautiful Soup (suddenly) doesn't get full webpage html	j.crater	8	17,171	Jul-11-2020, 04:31 PM Last Post: j.crater
	Requests-HTML vs Beautiful Soup - How to Choose?	robin73	0	3,853	Jun-23-2020, 02:53 PM Last Post: robin73
	Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row	BrandonKastning	0	2,398	Mar-22-2020, 06:10 AM Last Post: BrandonKastning
	How to get the href value of a specific word in the html code	julio2000	2	3,243	Mar-05-2020, 07:50 PM Last Post: julio2000

Getting a specific text inside an html with soup

User Panel Messages

Announcements