(Mar-11-2019, 11:53 PM)scidam Wrote: Since v.4.4.0 text renamed to string
Yes i agree that it say that in doc,but i think they not done it.
In docstring for 4.7.1 it still say
text
,but both work
text
or
string
.
>>> bs4.__version__
'4.7.1'
>>> help(soup.find)
Help on method find in module bs4.element:
find(name=None, attrs={}, recursive=True, text=None, **kwargs) method of bs4.BeautifulSoup instance
Return only the first child of this Tag matching the given
criteria.
Good expatiation about the
None
stuff
Can show a example of both,and i would also trow in a
regex
for it to be an all text search.
This is not the most normal usage of a parser,usually want more specific content that do a full text sreach of a web-page.
from bs4 import BeautifulSoup
import re
html = '''\
<p>Hello world and and python</p>
<td>python is a good language</td>
<td>not present in this text</td>
<div>Hello from python</div>'''
soup = BeautifulSoup(html, 'lxml')
the_word = 'python'
tags_found = soup.find_all(re.compile(".*"), text=lambda text: text and the_word in text)
print(tags_found)
print('-' * 15)
print([s.text for s in tags_found])
Output:
[<p>Hello world and and python</p>, <td>python is a good language</td>, <div>Hello from python</div>]
---------------
['Hello world and and python', 'python is a good language', 'Hello from python']
Without lambda(anonymous function no name),now a normal function with name.
from bs4 import BeautifulSoup
import re
html = '''\
<p>Hello world and and python</p>
<td>python is a good language</td>
<td>not present in this text</td>
<div>Hello from python</div>'''
def contains_word(text):
return text and the_word in text
soup = BeautifulSoup(html, 'lxml')
the_word = 'python'
tags_found = soup.find_all(re.compile(".*"), text=contains_word)
print(tags_found)
print('-' * 15)
print([s.text for s in tags_found])
Output:
[<p>Hello world and and python</p>, <td>python is a good language</td>, <div>Hello from python</div>]
---------------
['Hello world and and python', 'python is a good language', 'Hello from python']