Aug-07-2021, 01:05 AM
Hi all,
I'm still pretty new to webscraping and in trying to challenge myself with different scenarios, I've come across a scenario where there is 2 set's of text within the one object.
I'm not sure if I've explained that correctly, so let me first show you the html (please note: I've copied this snippet of the sites html into a local file to practice extracting various elements )
So here's the practice html that I've placed in a text file:
If I search for the text for span class "nodeexpiry":
So my question is, how would I go about only extracting the 12 Aug please?
I'm still pretty new to webscraping and in trying to challenge myself with different scenarios, I've come across a scenario where there is 2 set's of text within the one object.
I'm not sure if I've explained that correctly, so let me first show you the html (please note: I've copied this snippet of the sites html into a local file to practice extracting various elements )
So here's the practice html that I've placed in a text file:
Output:<div class="links" id="links642584"><ul class="links"><li><i class="fa fa-comment"></i> 40</li><li><span class="tag"><i class="fa fa-tag"></i> <a href="/cat/electrical-electronics">Electrical & Electronics</a></span></li><li><span class="nodeexpiry"><i class="fa fa-calendar"></i> 12 Aug <span class="marker">6 days left</span> </span></li></ul></div>
If I search for the text for span class "nodeexpiry":
from bs4 import BeautifulSoup import requests with open("C:/Users/test_html_data.html", encoding="utf8") as fp: soup = BeautifulSoup(fp, 'html.parser') litest = soup('li')[2] test1 = litest.find('span', {'class': 'nodeexpiry'}).textI get:
Output:12 Aug 6 Days Left
I know if I searched for the span class = marker, I can just get the 6 days left. So my question is, how would I go about only extracting the 12 Aug please?