Python Forum

Full Version: BeautifulSoup4 plugin help
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Im new to Python and finding difficulty understanding some code.

when using the Beautifulsoup4 plugin, I want to search through a source code text. therefore:
for a in soup.findall('a',{'class':'item-name'}):
where soup is the variable containing soup form source code text.

What does ('a',{'class':'item-name'}) exactly mean?
Quote:What does ('a',{'class':'item-name'}) exactly mean?
this means find all anchor tags whose class=item-name and insert into a list
Look at this CodePen
So class or id are a reference to the CSS file.
When parsing we use this reference to find tags needed.
It's also easier to not use dictionary call,the can just copy CSS class and add _.
soup.findall('a', {'class': 'item-name'}):

# Better
soup.findall('a', class_='item-name'):
Using code in CodePen.
from bs4 import BeautifulSoup

# Simulate a web page
html = '''\
<body>
  <div id='images'>
    <a href='image1.html'>My image 1 <br/><img src='https://i.picsum.photos/id/237/200/300.jpg'/></a>
  </div>
  <div>
    <p class="car">
      <a class="color_black" href="Link to bmw">BMV black model</a>
      <a class="color_red" href="Link to opel">Opel red model</a>
    </p>
  </div>
</body>
'''
soup = BeautifulSoup(html, 'html.parser')
Test usage two way find/find_all or select/select_one where using CSS selector.
>>> soup.find('a', class_="color_red")
<a class="color_red" href="Link to opel">Opel red model</a>
>>> soup.find('a', class_="color_red").text
'Opel red model'
>>> 
>>> # Using CSS selector
>>> soup.select('.color_red')
[<a class="color_red" href="Link to opel">Opel red model</a>]
>>> soup.select_one('.color_red').text
'Opel red model'
>>> 
>>> # id 
>>> soup.select_one('#images')
<div id="images">
<a href="image1.html">My image 1 <br/><img src="https://i.picsum.photos/id/237/200/300.jpg"/></a>
</div>
>>> soup.select_one('#images').img.get('src')
'https://i.picsum.photos/id/237/200/300.jpg'
Tutorial part-1, part-2.