Python Forum

Full Version: Web Scraping with BS and Requests --> Help me, please!
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello mates,

I'am a freshman of Python and I'm working on a web scraping project with the combined use of 'requests' and 'BeautifolSoup' for some time but, despite my best efforts, I still haven't been able to find a solution to a problem.

The "source" site from which to draw the data that interest me has, at a certain point, a structure of this type:

Quote:<div class="im-mainFeatures">
<ul class="nd-list nd-list--pipe"><li class="nd-list__item im-mainFeatures__price"><div class="im-mainFeatures__title"> Variabile_1</div></li><li class="nd-list__item"><span class="im-mainFeatures__value" >3
<svg viewBox="0 0 16 16" width="16" height="16" class="nd-icon im-mainFeatures__symbol " ><use class="nd-icon__use" xlink:href="/assets-au/sito-it/images/common/sprite-main-features___80243d0e.svg#planimetry"></use></svg></span><span class="im-mainFeatures__label">locali</span></li><li class="nd-list__item"><span class="im-mainFeatures__value" ><span>&nbsp;</span>100<span class="im-mainFeatures__symbol">m²</span></span><span class="im-mainFeatures__label">Variabile_2</span></li><li class="nd-list__item"><span class="im-mainFeatures__value" >2
<svg viewBox="0 0 16 16" width="16" height="16" class="nd-icon im-mainFeatures__symbol " ><use class="nd-icon__use" xlink:href="/assets-au/sito-it/images/common/sprite-main-features___80243d0e.svg#bathroom"></use></svg></span><span class="im-mainFeatures__label">Viariabile_3</span></li><li class="nd-list__item"><span class="im-mainFeatures__value" data-text="3">3
<svg viewBox="0 0 16 16" width="16" height="16" class="nd-icon im-mainFeatures__symbol " ><use class="nd-icon__use" xlink:href="/assets-au/sito-it/images/common/sprite-main-features___80243d0e.svg#stairs"></use></svg></span><span class="im-mainFeatures__label">Variabile_4</span></li></ul></div>

I am interested in extracting only the number highlighted in red (2), but it has tags in common with other elements (for example 'im-mainFeatures__value') and I can't get there. With the follow script:

X = soup.find('span', attrs = {'class':'im-mainFeatures__value'})
Y = X.text
Y = Y.replace(" ","").replace("\n","")
print (Y)
I get in output the blue value (3), probably only because it is first in the code. I have tried the combination of various BS methods, such as 'find_all', or 'find_next', but I only get errors or 'None' data. Can you suggest a way to get to return that value in print with precision?

Thank you very much everyone for the answers that I am sure you will be able to give me!

Alex

P.S.
I left a greeting in this thread!
html = """<div class="im-mainFeatures">
 <ul class="nd-list nd-list--pipe">
  <li class="nd-list__item im-mainFeatures__price">
   <div class="im-mainFeatures__title">
    Variabile_1
   </div>
  </li>
  <li class="nd-list__item">
   <span class="im-mainFeatures__value">
    3
    <svg class="nd-icon im-mainFeatures__symbol" height="16" viewbox="0 0 16 16" width="16">
     <use class="nd-icon__use" xlink:href="/assets-au/sito-it/images/common/sprite-main-features___80243d0e.svg#planimetry">
     </use>
    </svg>
   </span>
   <span class="im-mainFeatures__label">
    locali
   </span>
  </li>
  <li class="nd-list__item">
   <span class="im-mainFeatures__value">
    <span>
    </span>
    100
    <span class="im-mainFeatures__symbol">
     m²
    </span>
   </span>
   <span class="im-mainFeatures__label">
    Variabile_2
   </span>
  </li>
  <li class="nd-list__item">
   <span class="im-mainFeatures__value">
    2
    <svg class="nd-icon im-mainFeatures__symbol" height="16" viewbox="0 0 16 16" width="16">
     <use class="nd-icon__use" xlink:href="/assets-au/sito-it/images/common/sprite-main-features___80243d0e.svg#bathroom">
     </use>
    </svg>
   </span>
   <span class="im-mainFeatures__label">
    Viariabile_3
   </span>
  </li>
  <li class="nd-list__item">
   <span class="im-mainFeatures__value" data-text="3">
    3
    <svg class="nd-icon im-mainFeatures__symbol" height="16" viewbox="0 0 16 16" width="16">
     <use class="nd-icon__use" xlink:href="/assets-au/sito-it/images/common/sprite-main-features___80243d0e.svg#stairs">
     </use>
    </svg>
   </span>
   <span class="im-mainFeatures__label">
    Variabile_4
   </span>
  </li>
 </ul>
</div>"""

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
title = soup.find('div', {'class':'im-mainFeatures__title'}).text.replace(" ","").replace("\n","")
print(f'Title:{title}')
span_values = [span.text.replace(" ","").replace("\n","") for span in soup.find_all('span', {'class':'im-mainFeatures__value'})]
span_labels = [span.text.replace(" ","").replace("\n","") for span in soup.find_all('span', {'class':'im-mainFeatures__label'})]
for label, value in zip(span_labels, span_values):
  print(f'{label}: {value}')
Output:
Title:Variabile_1 locali: 3 Variabile_2: 100m² Viariabile_3: 2 Variabile_4: 3
We don't know what the numbers mean and how you cn distinguish between them
Masterpiece! Thank you so much buran! I searched for an answer to my problem everywhere on the web without finding it. Then you, just in a moment, solved everything. I was really going crazy.

Sure, my html is a bit more complicated than what was written in my post, but by modifying and applying your system I was able to create dictionaries and extract the relative values ​​if any !!

You are great !!

Thanks again,
Alex