[split] How to find a specific word in a webpage and How to count it. - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: [split] How to find a specific word in a webpage and How to count it. (/thread-16722.html) |
[split] How to find a specific word in a webpage and How to count it. - marpop - Mar-11-2019 Hey, I'm new here Can someone please explain this line: words = soup.find(text=lambda text: text and the_word in text)I don't understand what is happening in lambda (I know what it is) Thanks in advance RE: [split] How to find a specific word in a webpage and How to count it. - scidam - Mar-11-2019 What version of bs did you use? The string argument is a new name for text argument, that was in previous versions of BS.Since v.4.4.0 text renamed to string soup.find(text = func)if string (or text) argument is a function, it should return True or False (from docs).This function is applied to each text fragment within tags, if it returns True this fragment is returned. find_all searches for all such occurrences , find stops on the first one.text=lambda text: text and the_word in text condition is simple: it search for non-empty string that includes the_word ; It could be rewritten, e.g. as text = lambda x: x and the_word in x . Probably, you can try omit first condition, i.e. remove x and , but this could cause an error, if x became, e.g., None . This additional condition (x and ) defends from errors that could rise when x becomes, e.g. None . In this case the_word in None would lead to TypeError .
RE: [split] How to find a specific word in a webpage and How to count it. - snippsat - Mar-12-2019 (Mar-11-2019, 11:53 PM)scidam Wrote: Since v.4.4.0 text renamed to stringYes i agree that it say that in doc,but i think they not done it. In docstring for 4.7.1 it still say text ,but both work text or string .>>> bs4.__version__ '4.7.1' >>> help(soup.find) Help on method find in module bs4.element: find(name=None, attrs={}, recursive=True, text=None, **kwargs) method of bs4.BeautifulSoup instance Return only the first child of this Tag matching the given criteria.Good expatiation about the None stuff Can show a example of both,and i would also trow in a regex for it to be an all text search.This is not the most normal usage of a parser,usually want more specific content that do a full text sreach of a web-page. from bs4 import BeautifulSoup import re html = '''\ <p>Hello world and and python</p> <td>python is a good language</td> <td>not present in this text</td> <div>Hello from python</div>''' soup = BeautifulSoup(html, 'lxml') the_word = 'python' tags_found = soup.find_all(re.compile(".*"), text=lambda text: text and the_word in text) print(tags_found) print('-' * 15) print([s.text for s in tags_found]) Without lambda(anonymous function no name),now a normal function with name. from bs4 import BeautifulSoup import re html = '''\ <p>Hello world and and python</p> <td>python is a good language</td> <td>not present in this text</td> <div>Hello from python</div>''' def contains_word(text): return text and the_word in text soup = BeautifulSoup(html, 'lxml') the_word = 'python' tags_found = soup.find_all(re.compile(".*"), text=contains_word) print(tags_found) print('-' * 15) print([s.text for s in tags_found])
|