Python Forum
trying to scrape a span inside a div using beautifulsoup - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: trying to scrape a span inside a div using beautifulsoup (/thread-32206.html)



trying to scrape a span inside a div using beautifulsoup - CompleteNewb - Jan-28-2021

Hello,
I have this problem trying to scrape a website using Beautifulsoup.
I'm trying to find a "span" in multiple "div", but I can't find anything deeper than the very first div

Here's my code

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.newegg.com/p/pl?d=graphicscard'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.findAll("span", {"class":"fs-13"})
and here's the result I have in my console

>>> containers = page_soup.findAll("div",{"id":"app"})
>>> len(containers)
1
>>> containers
[<div id="app"></div>]
>> containers = page_soup.findAll("span",{"class":"fs-11"})
>>> len(containers)
0
see the <div id="app"> is the very first div, but there's a whole bunch of stuff in this div. I can see it when I inspect the webpage, but if I try to find the <span class="fs-11"> using the function findAll i get nothing

if I call page_soup.body i get this result:

>>> page_soup.body
<body>
<div id="app"></div>
<div id="modal"></div>
<script>
    if (window.location.port !== '80') window.__env__ = 'dev';
  </script>
<script>
    window.appHash = 'b0b815fdc589074946ba';
  </script>
<script src="https://polyfill.io/v3/polyfill.min.js"></script>
<script src="https://cdn.polyfill.io/v......(cut for the sake of brievety)
So my question is: How do I scrape a <span> in a website which is embedded in multiple <div>?


RE: trying to scrape a span inside a div using beautifulsoup - Larz60+ - Jan-28-2021

perhaps it would be better to describe exactly what you are trying to scrape on this page,
There's a lot of JavaScript here, so beautiful soup may not be able to get what you want.

you may have to use selenium


RE: trying to scrape a span inside a div using beautifulsoup - snippsat - Jan-28-2021

If i look can not find a class=fs-11 on that site.
Can look at this test,see that i use Requests and not urllib.
So this will take out product info and price for the first graphics card.
import requests
from bs4 import BeautifulSoup

url = 'https://www.newegg.com/p/pl?d=graphicscard'
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36"}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'lxml')
prod_info = soup.select_one('div.item-info > a')
print(prod_info.text)
print('-' * 25)
price = soup.select_one('ul > li.price-current')
print(price.text)
Output:
XFX Radeon RX 580 DirectX 12 RX-580P8DFD6 XXX Edition 8GB 256-Bit GDDR5 PCI Express 3.0 CrossFireX Support Video Card ------------------------- $696.98 (4 Offers)–