Python Forum
Getting a specific text inside an html with soup
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Getting a specific text inside an html with soup
#7
Quote:Thanks snippsat. Although it seems to be selected the 'span' inside the first 'a' only.
It should select all,quick test with code you posted.
from bs4 import BeautifulSoup

html = '''\
<div class="o-teaser o-teaser--article o-teaser--small o-teaser--has-image js-teaser" data-id="3bbb6fec-88c5-11e9-a028-86cea8523dc2">
<div class="o-teaser__content">
<div class="o-teaser__meta">
<div class="o-teaser__meta-tag">
<a class="o-teaser__tag" data-trackable="teaser-tag" href="/stream/254cd19f-4724-4c89-9230-926e8201a823">Huawei Technologies Co Ltd</a>
</div>
</div>
<div class="o-teaser__heading">
<a class="js-teaser-heading-link" data-trackable="heading-link" href="/content/3bbb6fec-88c5-11e9-a028-86cea8523dc2">
<span>
<mark class="search-item__highlight">Google</mark> warns of US national security risks from Huawei ban
</span>
</a>
</div>
<p class="o-teaser__standfirst">
<a class="js-teaser-standfirst-link" data-trackable="standfirst-link" href="/content/3bbb6fec-88c5-11e9-a028-86cea8523dc2" tabindex="-1">
<span>
...
<mark class="search-item__highlight">Google</mark> has warned the Trump administration it risks compromising US national security if it pushes ahead with sweeping export restrictions on Huawei, as the technology group seeks to continue doing...
</span>
</a></p><div class="o-teaser__timestamp">
<time class="o-teaser__timestamp-date" datetime="2019-06-07T03:36:51+0000">June 7, 2019</time>'''

soup = BeautifulSoup(html, 'lxml')
Test:
>>> span_tag = soup.select('a span')
>>> len(span_tag)
2

>>> span_tag[0]
<span>
<mark class="search-item__highlight">Google</mark> warns of US national security risks from Huawei ban
</span>

>>> for tag in span_tag:
...     print(tag.text.strip())
...     
Google warns of US national security risks from Huawei ban
...
Google has warned the Trump administration it risks compromising US national security if it pushes ahead with sweeping export restrictions on Huawei, as the technology group seeks to continue doing...
Reply


Messages In This Thread
RE: Getting a specific text inside an html with soup - by snippsat - Jul-09-2019, 11:51 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Soup('A') new_coder_231013 6 2,622 Aug-12-2023, 10:55 AM
Last Post: Pubfonts
  Python Obstacles | Karate | HTML/Scrape Specific Tag and Store it in MariaDB BrandonKastning 8 3,229 Nov-22-2021, 01:38 AM
Last Post: BrandonKastning
  How to get specific TD text via Selenium? euras 3 8,895 May-14-2021, 05:12 PM
Last Post: snippsat
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,695 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Any way to remove HTML tags from scraped data? (I want text only) SeBz2020uk 1 3,511 Nov-02-2020, 08:12 PM
Last Post: Larz60+
  Help: Beautiful Soup - Parsing HTML table ironfelix717 2 2,733 Oct-01-2020, 02:19 PM
Last Post: snippsat
  Beautiful Soup (suddenly) doesn't get full webpage html j.crater 8 17,171 Jul-11-2020, 04:31 PM
Last Post: j.crater
  Requests-HTML vs Beautiful Soup - How to Choose? robin73 0 3,853 Jun-23-2020, 02:53 PM
Last Post: robin73
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,398 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  How to get the href value of a specific word in the html code julio2000 2 3,243 Mar-05-2020, 07:50 PM
Last Post: julio2000

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020