Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 .findAll()
#1
code:
import requests
from bs4 import BeautifulSoup

html = requests.get("http://www.pythonscraping.com/pages/warandpeace.html")
bsObj = BeautifulSoup(html.content, 'html.parser')

allText = bsObj.findAll(id="title", class="text")
print(allText)
error:
Error:
File "C:\Python36\kodovi\wbp.py", line 19 allText = bsObj.findAll(id="title", class="green") ^ SyntaxError: invalid syntax
Not sure why this error.
Quote
#2
class is a keyword.
Truman likes this post
Quote
#3
when I use a little trick class_ get [] every time whatever word that I use instead of "text".
Quote
#4
You can perhaps try .findall(**{'id': 'foo', 'class': 'bar'})
Quote
#5
You can call CSS class in bs4 bye using _ class_="text".
Example:
import requests
from bs4 import BeautifulSoup

html = requests.get("http://www.pythonscraping.com/pages/warandpeace.html")
soup = BeautifulSoup(html.content, 'html.parser')
red_text = soup.find('span', class_="red")
print(red_text.text)
Output:
Well, Prince, so Genoa and Lucca are now just family estates of the Buonapartes. But I warn you, if you don't tell me that this means war, if you still try to defend the infamies and horrors perpetrated by that Antichrist- I really believe he is Antichrist- I will have nothing more to do with you and you are no longer my friend, no longer my 'faithful slave,' as you call yourself! But how do you do? I see I have frightened you- sit down and tell me all the news.
Se that i don't use camelCase at all,so findAll() work but it's there Backward compatibility(bs3) correct way is find_all().
Quote
#6
(Nov-14-2018, 11:05 PM)Gribouillis Wrote: You can perhaps try .findall(**{'id': 'foo', 'class': 'bar'})

Again, I get []

Error:
Traceback (most recent call last): File "C:\Python36\kodovi\wbp.py", line 33, in <module> print(red_text.text) File "C:\Python36\lib\site-packages\bs4\element.py", line 1807, in __getattr__ "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()? " % key AttributeError: ResultSet object has no attribute 'text'. You're probably treati ng a list of items like a single item. Did you call find_all() when you meant to call find()?
snippsat, I get this when I try to use findAll.

Also, your code does a different thing. It doesn't pick up word from id attribute.
Quote
#7
(Nov-16-2018, 12:21 AM)Truman Wrote: snippsat, I get this when I try to use findAll.
find_all()(this is what you should use not CamelCase) return a list,so can not use .text on list.
Now all red text is in a list.
import requests
from bs4 import BeautifulSoup

html = requests.get("http://www.pythonscraping.com/pages/warandpeace.html")
soup = BeautifulSoup(html.content, 'html.parser')
red_text = soup.find_all('span', class_="red")
print(red_text)
So to take out a part.
>>> red_text[3]
<span class="red">First of all, dear friend, tell me how you are. Set your friend's
mind at rest,</span>

# Print out text of part 3
>>> print(red_text[3].text)
First of all, dear friend, tell me how you are. Set your friend's
mind at rest,
To print all red text.
>>> for all_red in red_text:
...     print(all_red.text)
...     
Well, Prince, so Genoa and Lucca are now just family estates of the
Buonapartes. But I warn you, if you don't tell me that this means war,
... ect
Quote:Also, your code does a different thing. It doesn't pick up word from id attribute.
Your first code is wrong there is no id="title" in source code on that web page.
The only id is <div id="text"> this is all text on page.
You have to read source code for web site you parse.
Truman likes this post
Quote
#8
Thank you. One more thing - why in the second and third block of code you use .text for printing while you don't do that in the first example as it is a list.

And are you saying that .findAll should not be used at all?
Quote
#9
(Nov-17-2018, 12:08 AM)Truman Wrote: Thank you. One more thing - why in the second and third block of code you use .text for printing while you don't do that in the first example as it is a list.
Because it's list that keep content,and list container has no method text.
Only content inside list are bs4.element.tag that have a text method
>>> red_text[3]
<span class="red">First of all, dear friend, tell me how you are. Set your friend's
mind at rest,</span>

>>> # Look at type inside list
>>> type(red_text[3])
<class 'bs4.element.Tag'>

>>> # So it's a bs4.element that have method text
>>> print(red_text[3].text)
First of all, dear friend, tell me how you are. Set your friend's
mind at rest,
So if try to do this you see that it dos not make sense.
>>> lst = []
>>> lst.text
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
AttributeError: 'list' object has no attribute 'text'
Quote:And are you saying that .findAll should not be used at all?
findAll() and find_all() both work and do the same in bs4.
findAll() is kept so older code can work(backward compatibility)
CamelCase is bad style in Python,so don't use findAll().
Truman likes this post
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  re.findall help searching for string in xml response mugster 2 715 May-30-2018, 03:27 PM
Last Post: mugster
  Different Output of findall and search in re module shiva 1 597 Mar-12-2018, 08:39 PM
Last Post: snippsat

Forum Jump:


Users browsing this thread: 1 Guest(s)