Jul-08-2019, 01:19 PM
Hi, I apologies for the question but I am new to scrapping in python and I struggle with accessing a text inside an html. I passed the article/html through the soup but I haven't succeed in getting the text (in bold). I tried children,comments and different type of navigable string but the best I could get was getting "Google" when I am trying to use the below
the html code below
<div class="o-teaser o-teaser--article o-teaser--small o-teaser--has-image js-teaser" data-id="3bbb6fec-88c5-11e9-a028-86cea8523dc2">
<div class="o-teaser__content">
<div class="o-teaser__meta">
<div class="o-teaser__meta-tag">
<a class="o-teaser__tag" data-trackable="teaser-tag" href="/stream/254cd19f-4724-4c89-9230-926e8201a823">Huawei Technologies Co Ltd</a>
</div>
</div>
<div class="o-teaser__heading">
<a class="js-teaser-heading-link" data-trackable="heading-link" href="/content/3bbb6fec-88c5-11e9-a028-86cea8523dc2">
<span>
<mark class="search-item__highlight">Google</mark> warns of US national security risks from Huawei ban
</span>
</a>
</div>
<p class="o-teaser__standfirst">
<a class="js-teaser-standfirst-link" data-trackable="standfirst-link" href="/content/3bbb6fec-88c5-11e9-a028-86cea8523dc2" tabindex="-1">
<span>
...
<mark class="search-item__highlight">Google</mark> has warned the Trump administration it risks compromising US national security if it pushes ahead with sweeping export restrictions on Huawei, as the technology group seeks to continue doing...
</span>
</a></p><div class="o-teaser__timestamp">
<time class="o-teaser__timestamp-date" datetime="2019-06-07T03:36:51+0000">June 7, 2019</time>
link = soup.find_all('p')[i] article_body.append(link.string)Thanks in advance for the help. Any suggestion would be very much appreciated
the html code below
<div class="o-teaser o-teaser--article o-teaser--small o-teaser--has-image js-teaser" data-id="3bbb6fec-88c5-11e9-a028-86cea8523dc2">
<div class="o-teaser__content">
<div class="o-teaser__meta">
<div class="o-teaser__meta-tag">
<a class="o-teaser__tag" data-trackable="teaser-tag" href="/stream/254cd19f-4724-4c89-9230-926e8201a823">Huawei Technologies Co Ltd</a>
</div>
</div>
<div class="o-teaser__heading">
<a class="js-teaser-heading-link" data-trackable="heading-link" href="/content/3bbb6fec-88c5-11e9-a028-86cea8523dc2">
<span>
<mark class="search-item__highlight">Google</mark> warns of US national security risks from Huawei ban
</span>
</a>
</div>
<p class="o-teaser__standfirst">
<a class="js-teaser-standfirst-link" data-trackable="standfirst-link" href="/content/3bbb6fec-88c5-11e9-a028-86cea8523dc2" tabindex="-1">
<span>
...
<mark class="search-item__highlight">Google</mark> has warned the Trump administration it risks compromising US national security if it pushes ahead with sweeping export restrictions on Huawei, as the technology group seeks to continue doing...
</span>
</a></p><div class="o-teaser__timestamp">
<time class="o-teaser__timestamp-date" datetime="2019-06-07T03:36:51+0000">June 7, 2019</time>