Python Forum

Full Version: How to fix looking specific word in a webpage
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I was using to scrape a website to look for wordpress on as "/wp-", and it partially works, but it also partially doesn't.

The problem is that when it looks and counts for /wp-, it gives way too many results on all the sites I am looking at. If I manually inspect https://arstechnica.com/ and look for /wp- on it using ctrl+f, it would bring up around 46 results.
If I use the code, it brings up 922 results.

Is there a way to fix it from bring up so many results?
Also, is there a way to bring up only the first result of /wp- too?
I am curious in trying to incorporate both ways in a future code.

Thank you very much for your help and any advice you might have on how to fix this!

#!bin/usr/python3

import urllib.request
import urlopen
import bs4
import queue
import urllib.request as urllib2 
import urllib3
import re
import requests
from bs4 import BeautifulSoup
 
def count_words(url, the_word):
    r = requests.get(url, allow_redirects=False)
    soup = BeautifulSoup(r.content, 'lxml')
    words = soup.find(text=lambda text: text and the_word in text)
    print(words)
    return len(words)
 
 
def main():
    url = 'https://arstechnica.com/'
    word = '/wp-'
    count = count_words(url, word)
    print('\nUrl: {}\ncontains {} occurrences of word: {}'.format(url, count, word))
 
if __name__ == '__main__':
    main()