Python Forum
Extract something when you have multiple tags
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extract something when you have multiple tags
#1
Hey all,

I am practicing webscraping and I've come across a scenario where I'm a little stuck.

First, here's a snapshot of the code (which works up to this point)

from bs4 import BeautifulSoup
import requests
import pandas as pd

url = ('mytesturl')

page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
voteup = (soup.find('span', {'class': 'nvb voteup'}))
print(voteup)
This gives me the following result:
Output:
<span class="nvb voteup"><i class="fa fa-plus"></i><span>1</span></span>
So what I'm trying to do, is be able to navigate through this result and extract the "1" towards the end between the 'span' tags. I've tried looking through some of the BeautifulSoup documentation and also tried searching online (although not sure what to properly search for in Google and maybe missed it in the BS documentation too Confused ).

Would anyone be able to enlighten me on how to navigate through the current results to then go on to extract the 1 between the span tags please?

Thanks a lot.
Reply
#2
Add after:
voteup = (soup.find('span', {'class': 'nvb voteup'}))
to:
splist = votup.find_all('span')

you can then extract your data from the splist
Reply
#3
(Aug-03-2021, 09:10 PM)Larz60+ Wrote: Add after:
voteup = (soup.find('span', {'class': 'nvb voteup'}))
to:
splist = votup.find_all('span')

you can then extract your data from the splist

Hi Larz60+,

Thank you very much for your assistance. I tried and failed at so many variations- I could have sworn I also tried the Find All approach and it didn't work for me. Clearly I didn't execute it properly and now it looks so simple!

Appreciate your time in helping me out.

Have a great day.
Reply
#4
(Aug-03-2021, 11:51 AM)knight2000 Wrote: Would anyone be able to enlighten me on how to navigate through the current results to then go on to extract the 1 between the span tags please?
from bs4 import BeautifulSoup

html = '''\
<span class="nvb voteup"><i class="fa fa-plus"></i><span>1</span></span>'''
soup = BeautifulSoup(html, 'lxml')
Usage with CSS selector
>>> soup.select_one('body > span > span')
<span>1</span>
>>> soup.select_one('body > span > span').text
'1'
If more tag use select() then get a list of tag back that eg can loop over.
html = '''\
<span class="nvb voteup"><i class="fa fa-plus"></i><span>1</span></span>
<span class="nvb voteup"><i class="fa fa-plus"></i><span>2</span></span>
<span class="nvb voteup"><i class="fa fa-plus"></i><span>3</span></span>'''
Usage.
>>> tag = soup.select('body > span > span')
>>> tag
[<span>1</span>, <span>2</span>, <span>3</span>]
>>> for span in tag:
...     print(span.text)
...     
1
2
3

>>> [span.text for span in tag]
['1', '2', '3']
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020