BeautifulSoup4, How to get an HTML tag with specific class. - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: BeautifulSoup4, How to get an HTML tag with specific class. (/thread-14280.html) |
BeautifulSoup4, How to get an HTML tag with specific class. - Broadsworde - Nov-22-2018 I have HTML code like the following from a URL: <img class="this" alt="this" src="this_source1.gif"> <img class="this" alt="this" src="this_source2.gif"> <img class="this" alt="this" src="this_source3.gif"> <img class="this and that" alt="not this" src="this__and_that_source1.gif"> <img class="this and that" alt="not this" src="this__and_that_source2.gif"> <img class="this and that" alt="not this" src="this__and_that_source3.gif"> I'm trying to get the alt value of just the img tags with only class="this" import requests from bs4 import BeautifulSoup url = "https://someurl.com" resp = requests.get(url) txt = resp.text soup = BeautifulSoup(txt, 'lxml') imgThis = soup.find_all('img', class_='this') for i in (imgThis): imgThis[i]['alt']The find_all method returns alts for both class_="this" and class_="this and that" How do I specify only to return class_="this"? BeautifulSoup4, How to get an HTML tag with specific class. - Broadsworde - Nov-22-2018 I have HTML code like the following from a URL: <img class="this" alt="this" src="this_source1.gif"> <img class="this" alt="this" src="this_source2.gif"> <img class="this" alt="this" src="this_source3.gif"> <img class="this and that" alt="not this" src="this__and_that_source1.gif"> <img class="this and that" alt="not this" src="this__and_that_source2.gif"> <img class="this and that" alt="not this" src="this__and_that_source3.gif"> I'm trying to get the alt strings of img tags with specifically class="this" import requests from bs4 import BeautifulSoup url = 'https://someurl.com' resp = requests.get(url) txt = resp.text soup = BeautifulSoup(txt, 'lxml') imgThis = soup.find_all('img', class_='this') for i in (imgThis): imgThis[i]['alt']The find_all method returns matches for both class_="this" and class_="this and that" Output: this this this this and that this and that this and thatHow do I specify only to return class_="this"? RE: BeautifulSoup4, How to get an HTML tag with specific class. - Larz60+ - Nov-22-2018 for example, <img class="this" alt="this" src="this_source1.gif"> use: source1 = soup.find('img', {'class': 'this'}) RE: BeautifulSoup4, How to get an HTML tag with specific class. - Broadsworde - Nov-22-2018 Thank you Larz. I did try: test = soup.find('img', {'class': 'this'})But that returned just the first instance of <img class="this Which happened to be a <img class="this and that" and test = soup.find_all('img', {'class': 'this'}) [python] returns all img tags with class="this" and class="this and that" [hr] and [python] test = soup.find_all('img', {'class': 'this'})returns all img tags with class="this" and class="this and that" ...and test = soup.find_all('img', {'class': 'this'})returns all img tags with class="this" and class="this and that" RE: BeautifulSoup4, How to get an HTML tag with specific class. - stranac - Nov-22-2018 If you really must use bs4, I would use its CSS selector support and stay away from the weird find /find_all api.This is one way to achieve what you want: soup.select('img[class="this"]')In general, I'd recommend using lxml instead of bs4 for pretty much anything. RE: BeautifulSoup4, How to get an HTML tag with specific class. - Broadsworde - Nov-22-2018 Thanks stranac! That seems to have done the trick. It's a shame the BeautifulSoup documentation is less than optimal! RE: BeautifulSoup4, How to get an HTML tag with specific class. - snippsat - Nov-22-2018 Edit this is merge of Threads,so my answer is same as @stranac. ----- Can use CSS selectors to match the exact class name. from bs4 import BeautifulSoup html = '''\ <img class="this" alt="this" src="this_source1.gif"> <img class="this" alt="this" src="this_source2.gif"> <img class="this" alt="this" src="this_source3.gif"> <img class="this and that" alt="not this" src="this__and_that_source1.gif"> <img class="this and that" alt="not this" src="this__and_that_source2.gif"> <img class="this and that" alt="not this" src="this__and_that_source3.gif">''' soup = BeautifulSoup(html, 'lxml') only_this = soup.select('img[class="this"]')Test: >>> only_this [<img alt="this" class="this" src="this_source1.gif"/>, <img alt="this" class="this" src="this_source2.gif"/>, <img alt="this" class="this" src="this_source3.gif"/>] >>> [i.get('src') for i in only_this] ['this_source1.gif', 'this_source2.gif', 'this_source3.gif'] |