Python Forum
.findAll() - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: .findAll() (/thread-14103.html)



.findAll() - Truman - Nov-14-2018

code:
import requests
from bs4 import BeautifulSoup

html = requests.get("http://www.pythonscraping.com/pages/warandpeace.html")
bsObj = BeautifulSoup(html.content, 'html.parser')

allText = bsObj.findAll(id="title", class="text")
print(allText)
error:
Error:
File "C:\Python36\kodovi\wbp.py", line 19 allText = bsObj.findAll(id="title", class="green") ^ SyntaxError: invalid syntax
Not sure why this error.


RE: .findAll() - Gribouillis - Nov-14-2018

class is a keyword.


RE: .findAll() - Truman - Nov-14-2018

when I use a little trick class_ get [] every time whatever word that I use instead of "text".


RE: .findAll() - Gribouillis - Nov-14-2018

You can perhaps try .findall(**{'id': 'foo', 'class': 'bar'})


RE: .findAll() - snippsat - Nov-14-2018

You can call CSS class in bs4 bye using _ class_="text".
Example:
import requests
from bs4 import BeautifulSoup

html = requests.get("http://www.pythonscraping.com/pages/warandpeace.html")
soup = BeautifulSoup(html.content, 'html.parser')
red_text = soup.find('span', class_="red")
print(red_text.text)
Output:
Well, Prince, so Genoa and Lucca are now just family estates of the Buonapartes. But I warn you, if you don't tell me that this means war, if you still try to defend the infamies and horrors perpetrated by that Antichrist- I really believe he is Antichrist- I will have nothing more to do with you and you are no longer my friend, no longer my 'faithful slave,' as you call yourself! But how do you do? I see I have frightened you- sit down and tell me all the news.
Se that i don't use camelCase at all,so findAll() work but it's there Backward compatibility(bs3) correct way is find_all().


RE: .findAll() - Truman - Nov-16-2018

(Nov-14-2018, 11:05 PM)Gribouillis Wrote: You can perhaps try .findall(**{'id': 'foo', 'class': 'bar'})

Again, I get []

Error:
Traceback (most recent call last): File "C:\Python36\kodovi\wbp.py", line 33, in <module> print(red_text.text) File "C:\Python36\lib\site-packages\bs4\element.py", line 1807, in __getattr__ "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()? " % key AttributeError: ResultSet object has no attribute 'text'. You're probably treati ng a list of items like a single item. Did you call find_all() when you meant to call find()?
snippsat, I get this when I try to use findAll.

Also, your code does a different thing. It doesn't pick up word from id attribute.


RE: .findAll() - snippsat - Nov-16-2018

(Nov-16-2018, 12:21 AM)Truman Wrote: snippsat, I get this when I try to use findAll.
find_all()(this is what you should use not CamelCase) return a list,so can not use .text on list.
Now all red text is in a list.
import requests
from bs4 import BeautifulSoup

html = requests.get("http://www.pythonscraping.com/pages/warandpeace.html")
soup = BeautifulSoup(html.content, 'html.parser')
red_text = soup.find_all('span', class_="red")
print(red_text)
So to take out a part.
>>> red_text[3]
<span class="red">First of all, dear friend, tell me how you are. Set your friend's
mind at rest,</span>

# Print out text of part 3
>>> print(red_text[3].text)
First of all, dear friend, tell me how you are. Set your friend's
mind at rest,
To print all red text.
>>> for all_red in red_text:
...     print(all_red.text)
...     
Well, Prince, so Genoa and Lucca are now just family estates of the
Buonapartes. But I warn you, if you don't tell me that this means war,
... ect
Quote:Also, your code does a different thing. It doesn't pick up word from id attribute.
Your first code is wrong there is no id="title" in source code on that web page.
The only id is <div id="text"> this is all text on page.
You have to read source code for web site you parse.


RE: .findAll() - Truman - Nov-17-2018

Thank you. One more thing - why in the second and third block of code you use .text for printing while you don't do that in the first example as it is a list.

And are you saying that .findAll should not be used at all?


RE: .findAll() - snippsat - Nov-17-2018

(Nov-17-2018, 12:08 AM)Truman Wrote: Thank you. One more thing - why in the second and third block of code you use .text for printing while you don't do that in the first example as it is a list.
Because it's list that keep content,and list container has no method text.
Only content inside list are bs4.element.tag that have a text method
>>> red_text[3]
<span class="red">First of all, dear friend, tell me how you are. Set your friend's
mind at rest,</span>

>>> # Look at type inside list
>>> type(red_text[3])
<class 'bs4.element.Tag'>

>>> # So it's a bs4.element that have method text
>>> print(red_text[3].text)
First of all, dear friend, tell me how you are. Set your friend's
mind at rest,
So if try to do this you see that it dos not make sense.
>>> lst = []
>>> lst.text
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
AttributeError: 'list' object has no attribute 'text'
Quote:And are you saying that .findAll should not be used at all?
findAll() and find_all() both work and do the same in bs4.
findAll() is kept so older code can work(backward compatibility)
CamelCase is bad style in Python,so don't use findAll().