Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 BeautifulSoup4, How to get an HTML tag with specific class.
#1
I have HTML code like the following from a URL:
<img class="this" alt="this" src="this_source1.gif">
<img class="this" alt="this" src="this_source2.gif">
<img class="this" alt="this" src="this_source3.gif">
<img class="this and that" alt="not this" src="this__and_that_source1.gif">
<img class="this and that" alt="not this" src="this__and_that_source2.gif">
<img class="this and that" alt="not this" src="this__and_that_source3.gif">

I'm trying to get the alt value of just the img tags with only class="this"

import requests
from bs4 import BeautifulSoup
url = "https://someurl.com"
resp = requests.get(url)
txt = resp.text
soup = BeautifulSoup(txt, 'lxml')
imgThis = soup.find_all('img', class_='this')
for i in (imgThis):
	imgThis[i]['alt']
The find_all method returns alts for both class_="this" and class_="this and that"

How do I specify only to return class_="this"?
Quote
#2
I have HTML code like the following from a URL:
<img class="this" alt="this" src="this_source1.gif">
<img class="this" alt="this" src="this_source2.gif">
<img class="this" alt="this" src="this_source3.gif">
<img class="this and that" alt="not this" src="this__and_that_source1.gif">
<img class="this and that" alt="not this" src="this__and_that_source2.gif">
<img class="this and that" alt="not this" src="this__and_that_source3.gif">

I'm trying to get the alt strings of img tags with specifically class="this"

import requests
from bs4 import BeautifulSoup
url = 'https://someurl.com'
resp = requests.get(url)
txt = resp.text
soup = BeautifulSoup(txt, 'lxml')
imgThis = soup.find_all('img', class_='this')
for i in (imgThis):
	imgThis[i]['alt']
The find_all method returns matches for both class_="this" and class_="this and that"

Output:
this
this
this
this and that
this and that
this and that
How do I specify only to return class_="this"?
Quote
#3
for example,
<img class="this" alt="this" src="this_source1.gif">
use:
    source1 = soup.find('img', {'class': 'this'})
Quote
#4
Thank you Larz.

I did try:

test = soup.find('img', {'class': 'this'})
But that returned just the first instance of <img class="this
Which happened to be a <img class="this and that"

and
test = soup.find_all('img', {'class': 'this'})
[python]

returns all img tags with class="this" and class="this and that"
[hr]
and
[python]
test = soup.find_all('img', {'class': 'this'})
returns all img tags with class="this" and class="this and that"

...and

test = soup.find_all('img', {'class': 'this'})
returns all img tags with class="this" and class="this and that"
Quote
#5
If you really must use bs4, I would use its CSS selector support and stay away from the weird find/find_all api.
This is one way to achieve what you want:
soup.select('img[class="this"]')
In general, I'd recommend using lxml instead of bs4 for pretty much anything.
Quote
#6
Thanks stranac!

That seems to have done the trick.

It's a shame the BeautifulSoup documentation is less than optimal!
Quote
#7
Edit this is merge of Threads,so my answer is same as @stranac.
-----
Can use CSS selectors to match the exact class name.
from bs4 import BeautifulSoup

html = '''\
<img class="this" alt="this" src="this_source1.gif">
<img class="this" alt="this" src="this_source2.gif">
<img class="this" alt="this" src="this_source3.gif">
<img class="this and that" alt="not this" src="this__and_that_source1.gif">
<img class="this and that" alt="not this" src="this__and_that_source2.gif">
<img class="this and that" alt="not this" src="this__and_that_source3.gif">'''

soup = BeautifulSoup(html, 'lxml')
only_this = soup.select('img[class="this"]')
Test:
>>> only_this
[<img alt="this" class="this" src="this_source1.gif"/>,
 <img alt="this" class="this" src="this_source2.gif"/>,
 <img alt="this" class="this" src="this_source3.gif"/>]

>>> [i.get('src') for i in only_this]
['this_source1.gif', 'this_source2.gif', 'this_source3.gif']
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  spliting html code with br tag yokaso 11 233 Aug-07-2019, 03:18 PM
Last Post: snippsat
  How do I extract specific lines from HTML files before and after a word? glittergirl 1 218 Aug-06-2019, 07:23 AM
Last Post: fishhook
  How do I get rid of the HTML tags in my output? glittergirl 1 213 Aug-05-2019, 08:30 PM
Last Post: snippsat
  convert html table to json bhojendra 5 149 Jul-30-2019, 07:53 PM
Last Post: DeaD_EyE
  How to capture Single Column from Web Html Table? ahmedwaqas92 5 282 Jul-29-2019, 02:17 AM
Last Post: ahmedwaqas92
  Getting a specific text inside an html with soup mathieugrimbert 9 356 Jul-10-2019, 12:40 PM
Last Post: mathieugrimbert
  Cannot import BeautifulSoup4 as bs4 and BeautifulSoup4 not in same directory B5473829 1 189 Jul-05-2019, 06:55 PM
Last Post: snippsat
  getting options from a html form pgoosen 5 339 Jul-03-2019, 06:07 PM
Last Post: nilamo
  How to send data from remotely hosted HTML form to Pi sajid 2 252 Jun-27-2019, 10:28 PM
Last Post: sajid
  [Flask] html error 405 SheeppOSU 0 177 Jun-08-2019, 04:42 PM
Last Post: SheeppOSU

Forum Jump:


Users browsing this thread: 1 Guest(s)