Python Forum
Where There's A Space In An Object
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Where There's A Space In An Object
#1
Hi all,

In attempting to challenge myself extracting different objects with webscraping and Beautifulsoup, I've come across a small problem that I bet is easy to fix- but I have no idea what to search for in google to find the solution.

If an object has two words, for example in this code:
tag = item.find('div', {'class': 'tagger expired'})
when running the code, I get:
Output:
None
I've figured out that it's not referencing the 'tagger expired' because it's two words. If the object was one word, like 'tagger' or 'expired', it would run fine.

I tried doing something like:
tag = item.find('div', {'class': 'tagger%expired'})
but that returns 'None' also.

Could someone please enlighten me how to deal with an object that has a space between two or more words please?

Thanks a lot.
Reply
#2
(Aug-05-2021, 09:02 AM)knight2000 Wrote: I've figured out that it's not referencing the 'tagger expired' because it's two words.
I don't think this is the reason....

from bs4 import BeautifulSoup
html = """
<div class="tagger expired">spam</div>
"""
soup = BeautifulSoup(html, 'html.parser')
div = soup.find('div', {'class':'tagger expired'})
print(div.text)
Output:
spam
knight2000 likes this post
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
Firstly, you're using the wrong terminology in several places. You're searching for tags (or elements) whose class attribute has multiple values. Did you try CSS selectors, as described in the last example here (assuming you're using a recent version of Beautiful Soup). CSS selectors aren't a Beautiful Soup-specific thing, but the library supports them.
Reply
#4
Hi Buran,

Thank you for testing that out (I'll be using that method to check things in future). I've spent a few more hours today staring at the html code and after breaking it down a lot further, realized that contained a small variation that I didn't pick up on- hence it wasn't the space between the wording (as you correctly pointed out)

From this I've learnt to spend some more time trying to understand the structure of someone's html code, rather than running with what 'appears' obvious.



(Aug-05-2021, 09:30 AM)buran Wrote:
(Aug-05-2021, 09:02 AM)knight2000 Wrote: I've figured out that it's not referencing the 'tagger expired' because it's two words.
I don't think this is the reason....

from bs4 import BeautifulSoup
html = """
<div class="tagger expired">spam</div>
"""
soup = BeautifulSoup(html, 'html.parser')
div = soup.find('div', {'class':'tagger expired'})
print(div.text)
Output:
spam
Reply
#5
Thanks for chiming in ndc85430.

As a bit of a newbie, even after reading a lot of the documentation, I still sometimes forget what's what! Confused

The only thing I used was something very similar to what Buran used in his example. And, as he proved with his example, the code should work- so there was something else wrong...and he was spot on.

I made an assumption after staring at the code for a long time that turned out wrong.

Thanks again for your pointers though.

(Aug-05-2021, 09:33 AM)ndc85430 Wrote: Firstly, you're using the wrong terminology in several places. You're searching for tags (or elements) whose class attribute has multiple values. Did you try CSS selectors, as described in the last example here (assuming you're using a recent version of Beautiful Soup). CSS selectors aren't a Beautiful Soup-specific thing, but the library supports them.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  from global space to local space Skaperen 4 2,335 Sep-08-2020, 04:59 PM
Last Post: Skaperen

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020