Extracting the text between each "i class"

knight2000 · May-26-2021, 04:03 AM

Hi there,

a beginner here and trying to get some practice with the little I know in scraping a website.

To try and make my challenge clear, I'll start off by showing you all the data and then I'll explain what I'm trying to extract from it.

Output:[<i class="icon-v5 cross"></i>
</div>, <div class="col col-1 seller-interview">
<i class="icon-v5 cross"></i>
</div>, <div class="col col-1 seller-interview">
<i class="icon-v5 check-blue"></i>
</div>, <div class="col col-1 seller-interview">
<i class="icon-v5 cross"></i>
</div>, <div class="col col-1 seller-interview">
<i class="icon-v5 check-blue"></i>
</div>, <div class="col col-1 seller-interview">
<i class="icon-v5 check-blue"></i>

I've extracted this data using the following code:

import requests


url = '[my practice url]'

page = requests.get(url)
from bs4 import BeautifulSoup

soup = BeautifulSoup(page.text, 'html.parser')


# Get the Seller Interview tag
seller_interview = soup.findAll("div", {"class": "col col-1 seller-interview"})
print(seller_interview)

My goal is to be able to extract the text that is between the " " after the "i class= ". In other words, the text that should be returned will either be icon-v5 cross or icon-v5 check-blue for each line item.

For extracting other text between urls or text between spans, I've been able to use something like object.span.text or object.a.text but as this is between an 'i class', I'm not sure how to extract it.

I've read up on using the Regex module, and I tried something like:

matches = re.findall(r"icon-v5 cross", seller_interview)

but that gives me:

Error:
TypeError: expected string or bytes-like object

Could someone help me in how to extract the text that's in-between the "i class" on each line please?

**perfringo** · May-26-2021, 05:39 AM

Anybody who is trying to put together HTML and re should for starters read famous and opinionated You can't parse [X]HTML with regex

knight2000 · May-26-2021, 06:47 AM

(May-26-2021, 05:39 AM)perfringo Wrote: Anybody who is trying to put together HTML and re should for starters read famous and opinionated You can't parse [X]HTML with regex

Thanks perfringo.

Funny read! After reading that it appears like using RE with HTML isn't the way to do it. Is their another function or what else would you suggest?

**perfringo** · May-26-2021, 07:49 AM

Not tested, but something along those lines could work:

for record in seller_interview:
    for i in record.find_all('i'):
        print(i.get('class'))

knight2000 · May-26-2021, 09:55 AM

(May-26-2021, 07:49 AM)perfringo Wrote: Not tested, but something along those lines could work:
for record in seller_interview:
    for i in record.find_all('i'):
        print(i.get('class'))

Thank you perfringo. A nested loop within a nested loop- yep, I wouldn't have thought about that!

I tried what you gave me and it almost worked. An example of the output was:

Output:
['icon-v5', 'cross']

So I grabbed the second string by:

for record in seller_interview:
    for i in record.find_all("i"):
       s_interview = i.get("class")
       result = (s_interview[1])
       print(result)

I'm sure it's not the cleanest and could probably be done better, but it worked. Smile

Thank you for your help and direction with this- much appreciated.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	How to remove footer from PDF when extracting to text	jh67	3	5,115	Dec-13-2022, 06:52 AM Last Post: DPaul
	Extracting Specific Lines from text file based on content.	jokerfmj	8	3,051	Mar-28-2022, 03:38 PM Last Post: snippsat
	Extracting all text from a video	jehoshua	2	2,197	Nov-14-2021, 09:54 PM Last Post: jehoshua
	Extracting data based on specific patterns in a text file	K11	1	2,219	Aug-28-2020, 09:00 AM Last Post: Gribouillis
	Extracting Text	Evil_Patrick	6	2,927	Nov-13-2019, 08:51 AM Last Post: buran
	saving (in text or binary) an object under a defined class	cai0824	3	3,099	May-12-2019, 08:55 AM Last Post: snippsat
	Extracting a portion of a text document	alarcon032002	8	4,329	Jan-17-2019, 10:35 PM Last Post: Larz60+
	Google Cloud Vision: Extracting Location of Text	pablo_castano	0	2,667	Jun-24-2018, 02:47 AM Last Post: pablo_castano

Extracting the text between each "i class"

User Panel Messages

Announcements