Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Web page Extractor
#1
Hi All,

I am trying to read the web page from the given URL and search for a particular text in that page.
I am using beautifulsoup to achieve this.

code snippet:

def count_words(url, the_word):
    r = requests.get(url, allow_redirects=False)
    soup = BeautifulSoup(r.content, 'lxml')
    #words = soup.find(text=lambda text: text and the_word in text)
    words = soup.find(text=lambda text: text and the_word in text)
    print(words)
    return len(words)
 
 
def main():
    url = input("Enter URL Link")
    #url = 'https://en.wikipedia.org/wiki/Page'
    word = input("Word to Count:")
    #word = 'Page'
    count = count_words(url, word)
    print('\nUrl: {}\ncontains {} occurrences of word: {}'.format(url, count, word))
But it is not counting exact count of the given word.
It looks like it is trying to read the words inside the HTML source as well.
I need to get the content from Webpage Alone.


Can anyone help me on this.

Thanks,
Reply


Messages In This Thread
Web page Extractor - by sathiyarajmca - Oct-26-2018, 11:08 AM
RE: Web page Extractor - by wavic - Oct-26-2018, 12:47 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  use Xpath in Python :: libxml2 for a page-to-page skip-setting apollo 2 3,669 Mar-19-2020, 06:13 PM
Last Post: apollo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020