Python Forum
[split] How to find a specific word in a webpage and How to count it.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[split] How to find a specific word in a webpage and How to count it.
#1
Hey, I'm new here

Can someone please explain this line:
words = soup.find(text=lambda text: text and the_word in text)
I don't understand what is happening in lambda (I know what it is)

Thanks in advance
Reply
#2
What version of bs did you use?
The string argument is a new name for text argument, that was in previous versions of BS.
Since v.4.4.0 text renamed to string

soup.find(text = func)
if string (or text) argument is a function, it should return True or False (from docs).
This function is applied to each text fragment within tags, if it returns True this fragment is returned. find_all searches for all such occurrences , find stops on the first one.
text=lambda text: text and the_word in text condition is simple: it search for non-empty string that includes the_word; It could be rewritten, e.g. as text = lambda x: x and the_word in x. Probably, you can try omit first condition, i.e. remove x and, but this could cause an error, if x became, e.g., None. This additional condition (x and) defends from errors that could rise when x becomes, e.g. None. In this case the_word in None would lead to TypeError.
Reply
#3
(Mar-11-2019, 11:53 PM)scidam Wrote: Since v.4.4.0 text renamed to string
Yes i agree that it say that in doc,but i think they not done it.
In docstring for 4.7.1 it still say text,but both work text or string.
>>> bs4.__version__
'4.7.1'

>>> help(soup.find)
Help on method find in module bs4.element:

find(name=None, attrs={}, recursive=True, text=None, **kwargs) method of bs4.BeautifulSoup instance
    Return only the first child of this Tag matching the given
    criteria.
Good expatiation about the None stuff Thumbs Up

Can show a example of both,and i would also trow in a regex for it to be an all text search.
This is not the most normal usage of a parser,usually want more specific content that do a full text sreach of a web-page.
from bs4 import BeautifulSoup
import re

html = '''\
<p>Hello world and and python</p>
<td>python is a good language</td>
<td>not present in this text</td>
<div>Hello from python</div>'''

soup = BeautifulSoup(html, 'lxml')
the_word = 'python'
tags_found = soup.find_all(re.compile(".*"), text=lambda text: text and the_word in text)
print(tags_found)
print('-' * 15)
print([s.text for s in tags_found])
Output:
[<p>Hello world and and python</p>, <td>python is a good language</td>, <div>Hello from python</div>] --------------- ['Hello world and and python', 'python is a good language', 'Hello from python']

Without lambda(anonymous function no name),now a normal function with name.
from bs4 import BeautifulSoup
import re

html = '''\
<p>Hello world and and python</p>
<td>python is a good language</td>
<td>not present in this text</td>
<div>Hello from python</div>'''

def contains_word(text):
    return text and the_word in text

soup = BeautifulSoup(html, 'lxml')
the_word = 'python'
tags_found = soup.find_all(re.compile(".*"), text=contains_word)
print(tags_found)
print('-' * 15)
print([s.text for s in tags_found])
Output:
[<p>Hello world and and python</p>, <td>python is a good language</td>, <div>Hello from python</div>] --------------- ['Hello world and and python', 'python is a good language', 'Hello from python']
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Beautiful Soap can't find a specific section on the page Pavel_47 1 2,387 Jan-18-2021, 02:18 PM
Last Post: snippsat
  How to fix looking specific word in a webpage BSOD 0 1,830 Jun-16-2020, 08:01 PM
Last Post: BSOD
  Flask-Sqlalchemy count products in specific category imawesome 2 25,680 Mar-12-2020, 08:14 PM
Last Post: imawesome
  How to get the href value of a specific word in the html code julio2000 2 3,145 Mar-05-2020, 07:50 PM
Last Post: julio2000
  How do I extract specific lines from HTML files before and after a word? glittergirl 1 5,056 Aug-06-2019, 07:23 AM
Last Post: fishhook
  XML Parsing - Find a specific text (ElementTree) TeraX 3 4,025 Oct-09-2018, 09:06 AM
Last Post: TeraX
  How to find a specific word in a webpage and How to count it. pratheep 11 45,057 Feb-08-2018, 04:07 PM
Last Post: pratheep

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020