May-26-2021, 04:03 AM
Hi there,
a beginner here and trying to get some practice with the little I know in scraping a website.
To try and make my challenge clear, I'll start off by showing you all the data and then I'll explain what I'm trying to extract from it.
For extracting other text between urls or text between spans, I've been able to use something like object.span.text or object.a.text but as this is between an 'i class', I'm not sure how to extract it.
I've read up on using the Regex module, and I tried something like:
a beginner here and trying to get some practice with the little I know in scraping a website.
To try and make my challenge clear, I'll start off by showing you all the data and then I'll explain what I'm trying to extract from it.
Output:[<i class="icon-v5 cross"></i>
</div>, <div class="col col-1 seller-interview">
<i class="icon-v5 cross"></i>
</div>, <div class="col col-1 seller-interview">
<i class="icon-v5 check-blue"></i>
</div>, <div class="col col-1 seller-interview">
<i class="icon-v5 cross"></i>
</div>, <div class="col col-1 seller-interview">
<i class="icon-v5 check-blue"></i>
</div>, <div class="col col-1 seller-interview">
<i class="icon-v5 check-blue"></i>
I've extracted this data using the following code: import requests url = '[my practice url]' page = requests.get(url) from bs4 import BeautifulSoup soup = BeautifulSoup(page.text, 'html.parser') # Get the Seller Interview tag seller_interview = soup.findAll("div", {"class": "col col-1 seller-interview"}) print(seller_interview)My goal is to be able to extract the text that is between the " " after the "i class= ". In other words, the text that should be returned will either be icon-v5 cross or icon-v5 check-blue for each line item.
For extracting other text between urls or text between spans, I've been able to use something like object.span.text or object.a.text but as this is between an 'i class', I'm not sure how to extract it.
I've read up on using the Regex module, and I tried something like:
matches = re.findall(r"icon-v5 cross", seller_interview)but that gives me:
Error:TypeError: expected string or bytes-like object
Could someone help me in how to extract the text that's in-between the "i class" on each line please?