.findAll() - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: .findAll() (/thread-14103.html) |
.findAll() - Truman - Nov-14-2018 code: import requests from bs4 import BeautifulSoup html = requests.get("http://www.pythonscraping.com/pages/warandpeace.html") bsObj = BeautifulSoup(html.content, 'html.parser') allText = bsObj.findAll(id="title", class="text") print(allText)error: Not sure why this error.
RE: .findAll() - Gribouillis - Nov-14-2018 class is a keyword.
RE: .findAll() - Truman - Nov-14-2018 when I use a little trick class_ get [] every time whatever word that I use instead of "text".
RE: .findAll() - Gribouillis - Nov-14-2018 You can perhaps try .findall(**{'id': 'foo', 'class': 'bar'})
RE: .findAll() - snippsat - Nov-14-2018 You can call CSS class in bs4 bye using _ class_="text" .Example: import requests from bs4 import BeautifulSoup html = requests.get("http://www.pythonscraping.com/pages/warandpeace.html") soup = BeautifulSoup(html.content, 'html.parser') red_text = soup.find('span', class_="red") print(red_text.text) Se that i don't use camelCase at all,so findAll() work but it's there Backward compatibility(bs3) correct way is find_all() .
RE: .findAll() - Truman - Nov-16-2018 (Nov-14-2018, 11:05 PM)Gribouillis Wrote: You can perhaps try Again, I get [] snippsat, I get this when I try to use findAll .Also, your code does a different thing. It doesn't pick up word from id attribute. RE: .findAll() - snippsat - Nov-16-2018 (Nov-16-2018, 12:21 AM)Truman Wrote: snippsat, I get this when I try to use findAll. find_all() (this is what you should use not CamelCase) return a list,so can not use .text on list.Now all red text is in a list. import requests from bs4 import BeautifulSoup html = requests.get("http://www.pythonscraping.com/pages/warandpeace.html") soup = BeautifulSoup(html.content, 'html.parser') red_text = soup.find_all('span', class_="red") print(red_text)So to take out a part. >>> red_text[3] <span class="red">First of all, dear friend, tell me how you are. Set your friend's mind at rest,</span> # Print out text of part 3 >>> print(red_text[3].text) First of all, dear friend, tell me how you are. Set your friend's mind at rest,To print all red text. >>> for all_red in red_text: ... print(all_red.text) ... Well, Prince, so Genoa and Lucca are now just family estates of the Buonapartes. But I warn you, if you don't tell me that this means war, ... ect Quote:Also, your code does a different thing. It doesn't pick up word from id attribute.Your first code is wrong there is no id="title" in source code on that web page.The only id is <div id="text"> this is all text on page.You have to read source code for web site you parse. RE: .findAll() - Truman - Nov-17-2018 Thank you. One more thing - why in the second and third block of code you use .text for printing while you don't do that in the first example as it is a list.And are you saying that .findAll should not be used at all?
RE: .findAll() - snippsat - Nov-17-2018 (Nov-17-2018, 12:08 AM)Truman Wrote: Thank you. One more thing - why in the second and third block of code you use .text for printing while you don't do that in the first example as it is a list.Because it's list that keep content,and list container has no method text. Only content inside list are bs4.element.tag that have a text method>>> red_text[3] <span class="red">First of all, dear friend, tell me how you are. Set your friend's mind at rest,</span> >>> # Look at type inside list >>> type(red_text[3]) <class 'bs4.element.Tag'> >>> # So it's a bs4.element that have method text >>> print(red_text[3].text) First of all, dear friend, tell me how you are. Set your friend's mind at rest,So if try to do this you see that it dos not make sense. >>> lst = [] >>> lst.text Traceback (most recent call last): File "<interactive input>", line 1, in <module> AttributeError: 'list' object has no attribute 'text' Quote:And are you saying that .findAll should not be used at all? findAll() and find_all() both work and do the same in bs4.findAll() is kept so older code can work(backward compatibility)CamelCase is bad style in Python,so don't use findAll() .
|