Python Forum
How to fix looking specific word in a webpage
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to fix looking specific word in a webpage
#1
I was using to scrape a website to look for wordpress on as "/wp-", and it partially works, but it also partially doesn't.

The problem is that when it looks and counts for /wp-, it gives way too many results on all the sites I am looking at. If I manually inspect https://arstechnica.com/ and look for /wp- on it using ctrl+f, it would bring up around 46 results.
If I use the code, it brings up 922 results.

Is there a way to fix it from bring up so many results?
Also, is there a way to bring up only the first result of /wp- too?
I am curious in trying to incorporate both ways in a future code.

Thank you very much for your help and any advice you might have on how to fix this!

#!bin/usr/python3

import urllib.request
import urlopen
import bs4
import queue
import urllib.request as urllib2 
import urllib3
import re
import requests
from bs4 import BeautifulSoup
 
def count_words(url, the_word):
    r = requests.get(url, allow_redirects=False)
    soup = BeautifulSoup(r.content, 'lxml')
    words = soup.find(text=lambda text: text and the_word in text)
    print(words)
    return len(words)
 
 
def main():
    url = 'https://arstechnica.com/'
    word = '/wp-'
    count = count_words(url, word)
    print('\nUrl: {}\ncontains {} occurrences of word: {}'.format(url, count, word))
 
if __name__ == '__main__':
    main()
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to get the href value of a specific word in the html code julio2000 2 3,201 Mar-05-2020, 07:50 PM
Last Post: julio2000
  How do I extract specific lines from HTML files before and after a word? glittergirl 1 5,099 Aug-06-2019, 07:23 AM
Last Post: fishhook
  [split] How to find a specific word in a webpage and How to count it. marpop 2 5,789 Mar-12-2019, 08:25 AM
Last Post: snippsat
  How to find a specific word in a webpage and How to count it. pratheep 11 45,162 Feb-08-2018, 04:07 PM
Last Post: pratheep

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020