Jul-09-2019, 11:51 AM
Quote:Thanks snippsat. Although it seems to be selected the 'span' inside the first 'a' only.It should select all,quick test with code you posted.
from bs4 import BeautifulSoup html = '''\ <div class="o-teaser o-teaser--article o-teaser--small o-teaser--has-image js-teaser" data-id="3bbb6fec-88c5-11e9-a028-86cea8523dc2"> <div class="o-teaser__content"> <div class="o-teaser__meta"> <div class="o-teaser__meta-tag"> <a class="o-teaser__tag" data-trackable="teaser-tag" href="/stream/254cd19f-4724-4c89-9230-926e8201a823">Huawei Technologies Co Ltd</a> </div> </div> <div class="o-teaser__heading"> <a class="js-teaser-heading-link" data-trackable="heading-link" href="/content/3bbb6fec-88c5-11e9-a028-86cea8523dc2"> <span> <mark class="search-item__highlight">Google</mark> warns of US national security risks from Huawei ban </span> </a> </div> <p class="o-teaser__standfirst"> <a class="js-teaser-standfirst-link" data-trackable="standfirst-link" href="/content/3bbb6fec-88c5-11e9-a028-86cea8523dc2" tabindex="-1"> <span> ... <mark class="search-item__highlight">Google</mark> has warned the Trump administration it risks compromising US national security if it pushes ahead with sweeping export restrictions on Huawei, as the technology group seeks to continue doing... </span> </a></p><div class="o-teaser__timestamp"> <time class="o-teaser__timestamp-date" datetime="2019-06-07T03:36:51+0000">June 7, 2019</time>''' soup = BeautifulSoup(html, 'lxml')Test:
>>> span_tag = soup.select('a span') >>> len(span_tag) 2 >>> span_tag[0] <span> <mark class="search-item__highlight">Google</mark> warns of US national security risks from Huawei ban </span> >>> for tag in span_tag: ... print(tag.text.strip()) ... Google warns of US national security risks from Huawei ban ... Google has warned the Trump administration it risks compromising US national security if it pushes ahead with sweeping export restrictions on Huawei, as the technology group seeks to continue doing...