Python Forum

Full Version: trying to scrape a span inside a div using beautifulsoup
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,
I have this problem trying to scrape a website using Beautifulsoup.
I'm trying to find a "span" in multiple "div", but I can't find anything deeper than the very first div

Here's my code

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.newegg.com/p/pl?d=graphicscard'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.findAll("span", {"class":"fs-13"})
and here's the result I have in my console

>>> containers = page_soup.findAll("div",{"id":"app"})
>>> len(containers)
1
>>> containers
[<div id="app"></div>]
>> containers = page_soup.findAll("span",{"class":"fs-11"})
>>> len(containers)
0
see the <div id="app"> is the very first div, but there's a whole bunch of stuff in this div. I can see it when I inspect the webpage, but if I try to find the <span class="fs-11"> using the function findAll i get nothing

if I call page_soup.body i get this result:

>>> page_soup.body
<body>
<div id="app"></div>
<div id="modal"></div>
<script>
    if (window.location.port !== '80') window.__env__ = 'dev';
  </script>
<script>
    window.appHash = 'b0b815fdc589074946ba';
  </script>
<script src="https://polyfill.io/v3/polyfill.min.js"></script>
<script src="https://cdn.polyfill.io/v......(cut for the sake of brievety)
So my question is: How do I scrape a <span> in a website which is embedded in multiple <div>?
perhaps it would be better to describe exactly what you are trying to scrape on this page,
There's a lot of JavaScript here, so beautiful soup may not be able to get what you want.

you may have to use selenium
If i look can not find a class=fs-11 on that site.
Can look at this test,see that i use Requests and not urllib.
So this will take out product info and price for the first graphics card.
import requests
from bs4 import BeautifulSoup

url = 'https://www.newegg.com/p/pl?d=graphicscard'
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36"}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'lxml')
prod_info = soup.select_one('div.item-info > a')
print(prod_info.text)
print('-' * 25)
price = soup.select_one('ul > li.price-current')
print(price.text)
Output:
XFX Radeon RX 580 DirectX 12 RX-580P8DFD6 XXX Edition 8GB 256-Bit GDDR5 PCI Express 3.0 CrossFireX Support Video Card ------------------------- $696.98 (4 Offers)–