Python Forum
trying to scrape a span inside a div using beautifulsoup
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
trying to scrape a span inside a div using beautifulsoup
#1
Hello,
I have this problem trying to scrape a website using Beautifulsoup.
I'm trying to find a "span" in multiple "div", but I can't find anything deeper than the very first div

Here's my code

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.newegg.com/p/pl?d=graphicscard'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.findAll("span", {"class":"fs-13"})
and here's the result I have in my console

>>> containers = page_soup.findAll("div",{"id":"app"})
>>> len(containers)
1
>>> containers
[<div id="app"></div>]
>> containers = page_soup.findAll("span",{"class":"fs-11"})
>>> len(containers)
0
see the <div id="app"> is the very first div, but there's a whole bunch of stuff in this div. I can see it when I inspect the webpage, but if I try to find the <span class="fs-11"> using the function findAll i get nothing

if I call page_soup.body i get this result:

>>> page_soup.body
<body>
<div id="app"></div>
<div id="modal"></div>
<script>
    if (window.location.port !== '80') window.__env__ = 'dev';
  </script>
<script>
    window.appHash = 'b0b815fdc589074946ba';
  </script>
<script src="https://polyfill.io/v3/polyfill.min.js"></script>
<script src="https://cdn.polyfill.io/v......(cut for the sake of brievety)
So my question is: How do I scrape a <span> in a website which is embedded in multiple <div>?
Reply
#2
perhaps it would be better to describe exactly what you are trying to scrape on this page,
There's a lot of JavaScript here, so beautiful soup may not be able to get what you want.

you may have to use selenium
Reply
#3
If i look can not find a class=fs-11 on that site.
Can look at this test,see that i use Requests and not urllib.
So this will take out product info and price for the first graphics card.
import requests
from bs4 import BeautifulSoup

url = 'https://www.newegg.com/p/pl?d=graphicscard'
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36"}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'lxml')
prod_info = soup.select_one('div.item-info > a')
print(prod_info.text)
print('-' * 25)
price = soup.select_one('ul > li.price-current')
print(price.text)
Output:
XFX Radeon RX 580 DirectX 12 RX-580P8DFD6 XXX Edition 8GB 256-Bit GDDR5 PCI Express 3.0 CrossFireX Support Video Card ------------------------- $696.98 (4 Offers)–
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Beautifulsoup doesn't scrape page (python 2.7) Hikki 0 1,950 Aug-01-2020, 05:54 PM
Last Post: Hikki
  select all the span text with same attribute JennyYang 2 2,096 Jul-28-2020, 02:56 PM
Last Post: snippsat
  scrape data 1 go to next page scrape data 2 and so on alkaline3 6 5,087 Mar-13-2020, 07:59 PM
Last Post: alkaline3
  Scrap a dynamic span hefaz 0 2,659 Mar-07-2020, 02:56 PM
Last Post: hefaz
  Cannot get contents from ul.li.span.string LLLLLL 8 3,949 Nov-29-2019, 10:30 AM
Last Post: LLLLLL
  selenium click a span tag metulburr 1 21,837 Nov-30-2016, 05:47 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020