For Loop Returning 3 Results When There Should Be 1

knight2000 · Sep-26-2021, 03:11 AM

Hi Guys,

After trying to figure this one out for over 8 hours, I thought I would get a fresh perspective from someone.

I'm practicing some web scrapping and I've got a scenario where I've got a pretty easy goal: I'm trying to find an object and if it exists, extract some data from it (shipping information) and if it doesn't exist, enter something like " " (...because I'm going to be using pandas- so I need to do something when it can't find the object, else I know I'll get the "ValueError Arrays Must be All Same Length" error).

I've tried many things to do this, but I'm unable to successfully:
1) capture where the object doesn't exist; and
2) accurately get data from when the object does exist.

My current reiteration of the code is:

from bs4 import BeautifulSoup

with open("out_of_stock2.html", encoding="utf8") as fp:
    soup = BeautifulSoup(fp, 'html.parser')
    for item in soup:
        mt2 = soup.find('span', {'class': 'w_A w_C w_B mr1 mt1 ph1'})
        if mt2 is None:
            print('There is no record')
        else:
            print (mt2)

When I run this, I get:

Output:<span class="w_A w_C w_B mr1 mt1 ph1">1-day shipping</span>
<span class="w_A w_C w_B mr1 mt1 ph1">1-day shipping</span>
<span class="w_A w_C w_B mr1 mt1 ph1">1-day shipping</span>

I'm not sure why I'm getting 3 instances of this when the data only contains 1? (The object I'm looking for is "w_A w_C w_B mr1 mt1 ph1")

Additionally, there is one record in the dataset that doesn't contain the object but the code output ignores my print statement ('There is no record')

Could someone please shed some light on what I'm doing incorrectly?

Thank you.

SamHobbs · Sep-26-2021, 03:56 AM

Can you use the following?

from bs4 import BeautifulSoup
 
with open(r"out_of_stock2.html", encoding="utf8") as fp:
	soup = BeautifulSoup(fp, 'html.parser')
	print(len(soup))
	mt2 = soup.find('span', {'class': 'w_A w_C w_B mr1 mt1 ph1'})
	if mt2 is None:
		print('There is no record')
	else:
		print (mt2)

knight2000

Hi Sam,

Thanks for chiming in.

I tried your code and got:

Output:3
<span class="w_A w_C w_B mr1 mt1 ph1">1-day shipping</span>

It's still not reporting where it can't find 1 record (there should find one instance of the object and 1 instance where there is no record of the object), so it's still failing at:

if mt2 is None:
		print('There is no record')

The reason I used a loop is the real file contains about 40 records (I've just taken a sample of two records to troubleshoot), so I thought a loop would be required to go through each and look for that object?

SamHobbs · Sep-26-2021, 05:29 AM

I assume that

len(soup)

being 3 explains why you are getting 3 when you expect 1 but the BeautifulSoup documentation is not clear about what it is.

***snippsat*** · (This post was last modified: Sep-26-2021, 06:36 AM by snippsat.)

Should not loop over soup object knight2000,as it's not needed and can give unwanted result.
It will depend on parser used,so if i use lxml(recommend) as parser the length will be one.

from bs4 import BeautifulSoup

with open(r"out_of_stock2.html", encoding="utf8") as fp:
    soup = BeautifulSoup(fp, 'lxml')
    print(len(soup))
    mt2 = soup.find('span', class_="w_A w_C w_B mr1 mt1 ph1")
    if mt2 is None:
        print('There is no record')
    else:
        print (mt2)

Output:1
<span class="w_A w_C w_B mr1 mt1 ph1">1-day shipping</span>

It's easier to use class_="w_A w_C w_B mr1 mt1 ph1 than make it a dictionary call.
Then can just copy CSS class from web-site and add one _.

**deanhystad** · Sep-26-2021, 07:41 AM

You are doing something similar to this:

soup = {'A':1, 'B':2, 'C':3}
class_ = 'B'
for item in soup:
    mt2 = soup.get(class_)
    if mt2:
        print(mt2)
    else:
        print('There is no record')

Output:2
2
2

In this example and yours you will get a different item each time you iterate through soup, but soup either contains "class_" or not, and that is independent of the current item.

How you fix this depends on what you want to get from soup. From your description I think you would find all span and iterate through those items, comparing the item's class against your pattern. Something like this:

from bs4 import BeautifulSoup
 
with open("out_of_stock2.html", encoding="utf8") as fp:
    soup = BeautifulSoup(fp, 'html.parser')
    for item in soup.find('span'):
        if item['class_'] == "w_A w_C w_B mr1 mt1 ph1":
            print(item)
        else:
            print ('No match')

knight2000 · Sep-26-2021, 10:48 AM

You're spot on Sam. After replying to you, I was mulling over it and realized that the 3 from your code definitely gave a clue as to why I was getting 3 results.

(Sep-26-2021, 05:29 AM)SamHobbs Wrote: I assume that
len(soup)
being 3 explains why you are getting 3 when you expect 1 but the BeautifulSoup documentation is not clear about what it is.

knight2000 · Sep-26-2021, 11:05 AM

Hi snippsat,

Thank you for your advice about not using soup when looping- I had tried over 30 different methods to get this data and most of them didn't use soup when looping, but by the end of all those failures- I then tried soup Shocked

and off course that didn't work either! But good to know never to use it for looping.

Also, thank you for teaching me the easier way to call a class. That's soooo much easier than what I've always done. I have seen your method before, but as I'm still learning, I didn't want to try and learn too many variations and confuse myself more. Big Grin

With regards to parser, I've only ever used one: html.parser.

So I followed your suggestion to use lxml and tried the following code:

from bs4 import BeautifulSoup

with open(r"out_of_stock2.html", encoding="utf8") as fp:
    soup = BeautifulSoup(fp, 'lxml')
    ph1 = soup.find_all('div', class_ ='h-100 pb1-xl pr4-xl pv1 ph1')
    for item in ph1:
        mt1_ph1 = item.find('span', class_ = 'w_A w_C w_B mr1 mt1 ph1')
        if mt1_ph1 is None:
            print('No data')
        else:
            print(mt1_ph1.text)

The result it returned:

Output:No data
1-day shipping

You fixed it! Thank you so much. I've Wall

for 2 days trying to figure it out- and honestly probably wouldn't have thought of trying your option. Really appreciate it.

(Sep-26-2021, 06:36 AM)snippsat Wrote: Should not loop over soup object knight2000,as it's not needed and can give unwanted result.
It will depend on parser used,so if i use lxml(recommend) as parser the length will be one.
from bs4 import BeautifulSoup

with open(r"out_of_stock2.html", encoding="utf8") as fp:
    soup = BeautifulSoup(fp, 'lxml')
    print(len(soup))
    mt2 = soup.find('span', class_="w_A w_C w_B mr1 mt1 ph1")
    if mt2 is None:
        print('There is no record')
    else:
        print (mt2)
Output:1
<span class="w_A w_C w_B mr1 mt1 ph1">1-day shipping</span>
It's easier to use class_="w_A w_C w_B mr1 mt1 ph1 than make it a dictionary call.
Then can just copy CSS class from web-site and add one _.

knight2000 · Sep-26-2021, 11:14 AM

Hi deanhystad,

Thanks a lot for explaining it to me- I've read your reply a few times to try and understand it. Smile

I tried your code but I seem to have got an error:

Error:if item['class_'] == "w_A w_C w_B mr1 mt1 ph1":
TypeError: string indices must be integers

To be honest, not too sure what that means, but I seemed to have had success with the code by changing the parser from html to lxml.

Thank you for the time you invested in helping me.

Have a great one.

(Sep-26-2021, 07:41 AM)deanhystad Wrote: You are doing something similar to this:
soup = {'A':1, 'B':2, 'C':3}
class_ = 'B'
for item in soup:
    mt2 = soup.get(class_)
    if mt2:
        print(mt2)
    else:
        print('There is no record')
Output:2
2
2
In this example and yours you will get a different item each time you iterate through soup, but soup either contains "class_" or not, and that is independent of the current item.

How you fix this depends on what you want to get from soup. From your description I think you would find all span and iterate through those items, comparing the item's class against your pattern. Something like this:
from bs4 import BeautifulSoup
 
with open("out_of_stock2.html", encoding="utf8") as fp:
    soup = BeautifulSoup(fp, 'html.parser')
    for item in soup.find('span'):
        if item['class_'] == "w_A w_C w_B mr1 mt1 ph1":
            print(item)
        else:
            print ('No match')

SamHobbs · Sep-26-2021, 04:38 PM

(Sep-26-2021, 11:05 AM)knight2000 Wrote: So I followed your suggestion to use lxml and tried the following code:

In your fixed code you first find relevant div elements then look for a relevant span element and I think the requirement for the div elements was not in the original question. You were saying the code does not determine when there is not a match and I could not understand what that means.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Returning data on button click by buttons created by a loop	bradells	3	602	Apr-23-2025, 03:01 PM Last Post: Pedroski55
	WHILE LOOP NOT RETURNING USER INPUT AFTER ZerroDivisionError! HELP!	ayodele_martins1	7	2,712	Oct-01-2023, 07:36 PM Last Post: ayodele_martins1
	Help add for loop results in a list	paulo79	4	2,557	Mar-09-2022, 05:49 PM Last Post: deanhystad
	returning values in for loop	Nickd12	4	23,630	Dec-17-2020, 03:51 AM Last Post: snippsat
	Search Results Web results Printing the number of days in a given month and year	afefDXCTN	1	3,086	Aug-21-2020, 12:20 PM Last Post: DeaD_EyE
	Adding loop results as rows in dataframe	Shreya10o	2	2,989	May-09-2020, 11:00 AM Last Post: Shreya10o
	How to append one function1 results to function2 results	SriRajesh	5	4,579	Jan-02-2020, 12:11 PM Last Post: Killertjuh
	Returning true or false in a for loop	bbop1232012	3	11,479	Nov-22-2018, 04:44 PM Last Post: bbop1232012
	RegExp: returning 2nd loop in new document	syoung	5	5,339	May-02-2018, 12:36 PM Last Post: syoung

For Loop Returning 3 Results When There Should Be 1

User Panel Messages

Announcements