.findAll()

Truman · Nov-14-2018, 10:42 PM

code:

import requests
from bs4 import BeautifulSoup

html = requests.get("http://www.pythonscraping.com/pages/warandpeace.html")
bsObj = BeautifulSoup(html.content, 'html.parser')

allText = bsObj.findAll(id="title", class="text")
print(allText)

error:

Error:  File "C:\Python36\kodovi\wbp.py", line 19
    allText = bsObj.findAll(id="title", class="green")
                                            ^
SyntaxError: invalid syntax

Not sure why this error.

**Gribouillis** · Nov-14-2018, 10:49 PM

class is a keyword.

Truman · Nov-14-2018, 10:52 PM

when I use a little trick class_ get [] every time whatever word that I use instead of "text".

**Gribouillis** · Nov-14-2018, 11:05 PM

You can perhaps try .findall(**{'id': 'foo', 'class': 'bar'})

***snippsat*** · (This post was last modified: Nov-14-2018, 11:37 PM by snippsat.)

You can call CSS class in bs4 bye using _ class_="text".
Example:

import requests
from bs4 import BeautifulSoup

html = requests.get("http://www.pythonscraping.com/pages/warandpeace.html")
soup = BeautifulSoup(html.content, 'html.parser')
red_text = soup.find('span', class_="red")
print(red_text.text)

Output:Well, Prince, so Genoa and Lucca are now just family estates of the
Buonapartes. But I warn you, if you don't tell me that this means war,
if you still try to defend the infamies and horrors perpetrated by
that Antichrist- I really believe he is Antichrist- I will have
nothing more to do with you and you are no longer my friend, no longer
my 'faithful slave,' as you call yourself! But how do you do? I see
I have frightened you- sit down and tell me all the news.

Se that i don't use camelCase at all,so findAll() work but it's there Backward compatibility(bs3) correct way is find_all().

Truman · (This post was last modified: Nov-16-2018, 12:38 AM by Truman.)

(Nov-14-2018, 11:05 PM)Gribouillis Wrote: You can perhaps try .findall(**{'id': 'foo', 'class': 'bar'})

Again, I get []

Error:Traceback (most recent call last):
  File "C:\Python36\kodovi\wbp.py", line 33, in <module>
    print(red_text.text)
  File "C:\Python36\lib\site-packages\bs4\element.py", line 1807, in __getattr__

    "ResultSet object has no attribute '%s'. You're probably treating a list of
items like a single item. Did you call find_all() when you meant to call find()?
" % key
AttributeError: ResultSet object has no attribute 'text'. You're probably treati
ng a list of items like a single item. Did you call find_all() when you meant to
 call find()?

snippsat, I get this when I try to use findAll.

Also, your code does a different thing. It doesn't pick up word from id attribute.

***snippsat*** · (This post was last modified: Nov-16-2018, 12:51 PM by snippsat.)

(Nov-16-2018, 12:21 AM)Truman Wrote: snippsat, I get this when I try to use findAll.

find_all()(this is what you should use not CamelCase) return a list,so can not use .text on list.
Now all red text is in a list.

import requests
from bs4 import BeautifulSoup

html = requests.get("http://www.pythonscraping.com/pages/warandpeace.html")
soup = BeautifulSoup(html.content, 'html.parser')
red_text = soup.find_all('span', class_="red")
print(red_text)

So to take out a part.

>>> red_text[3]
<span class="red">First of all, dear friend, tell me how you are. Set your friend's
mind at rest,</span>

# Print out text of part 3
>>> print(red_text[3].text)
First of all, dear friend, tell me how you are. Set your friend's
mind at rest,

To print all red text.

>>> for all_red in red_text:
...     print(all_red.text)
...     
Well, Prince, so Genoa and Lucca are now just family estates of the
Buonapartes. But I warn you, if you don't tell me that this means war,
... ect

Quote:Also, your code does a different thing. It doesn't pick up word from id attribute.

Your first code is wrong there is no id="title" in source code on that web page.
The only id is <div id="text"> this is all text on page.
You have to read source code for web site you parse.

Truman · (This post was last modified: Nov-17-2018, 12:19 AM by Truman.)

Thank you. One more thing - why in the second and third block of code you use .text for printing while you don't do that in the first example as it is a list.

And are you saying that .findAll should not be used at all?

***snippsat*** · (This post was last modified: Nov-17-2018, 01:27 AM by snippsat.)

(Nov-17-2018, 12:08 AM)Truman Wrote: Thank you. One more thing - why in the second and third block of code you use .text for printing while you don't do that in the first example as it is a list.

Because it's list that keep content,and list container has no method text.
Only content inside list are bs4.element.tag that have a text method

>>> red_text[3]
<span class="red">First of all, dear friend, tell me how you are. Set your friend's
mind at rest,</span>

>>> # Look at type inside list
>>> type(red_text[3])
<class 'bs4.element.Tag'>

>>> # So it's a bs4.element that have method text
>>> print(red_text[3].text)
First of all, dear friend, tell me how you are. Set your friend's
mind at rest,

So if try to do this you see that it dos not make sense.

>>> lst = []
>>> lst.text
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
AttributeError: 'list' object has no attribute 'text'

Quote:And are you saying that .findAll should not be used at all?

findAll() and find_all() both work and do the same in bs4.
findAll() is kept so older code can work(backward compatibility)
CamelCase is bad style in Python,so don't use findAll().

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	re.findall help searching for string in xml response	mugster	2	3,227	May-30-2018, 03:27 PM Last Post: mugster
	Different Output of findall and search in re module	shiva	1	2,317	Mar-12-2018, 08:39 PM Last Post: snippsat

.findAll()

User Panel Messages

Announcements