Python Forum

Full Version: Finding exact phrase in list
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi Guys,

I am not too sure on the best way to solve this, i have some basic code:

# search for any of these strings
search_for = ['about me', 'home page']

# function - send a get request to each url
def send_get_request(link, search_for):
   try:
       html = requests.get(link)
   except requests.exceptions.RequestException as e:
       print("Error: {}" . format(e))
   if re.findall('|'.join(search_for), html.text.lower()):
       return link
   else:
       return False
The strings ion search_for if they are found in the html, it's a success, but i have noticed it only reads the first word for example: if it finds about it will flag it as a success, instead of reading the whole word which would be: about me is there a way to solve this? i assumed it would read the whole word but it seems not to.

thank you for any help guys!

Graham
There is a miss conception in your function. One time the function returns on success the link, if the function fails, it returns False. 2 days before I had in the German forum this discussion. The problem was, that the guy was using an API for a HX711, which was made by the community. The API has a read function, which is used internally to get the bits/bytes. Sometimes it fails. The problem is, that the internal read function sometimes return the Value and if no success, it returns False, which is a SubType of Integer. When the outside caller, calls for example the function to calculate the mean, there is one problem. Sometimes he calculates the mean value with False, which is 0, which will return a wrong mean. If you have more code working together, you'll have similar problems. Your function should return only True or False. If you want to return the link also, you should return (True, link) or (False, link).

In addition you can work without regex, which is often easier:


import operator
import requests


def send_get_request(link, search_for):
    """
    send a get request to url, return True if all stings
    are contained in html.text.
    """
    try:
        html = requests.get(link)
    except requests.exceptions.RequestException as e:
        print("Error: {}" . format(e))
        # maybe you want to handle the error on
        # caller side and not here.
        # depends on the logic and flow in your program
        return False # don't continue here
    text = html.text.lower()
    if all(operator.contains(text, keyword.lower()) for keyword in search_for):
        return True
    else:
        return False
Instead of using the keyword in (is an operator; contains), I use the function operator.contains together with all as a generator expression.
Thank you DeaD_EyE :)

That was greatly explained, i wondered why i could not find a solution while Googling, my logic was off, this makes sense now, i always tried to avoid regex too if i could lol that is much better :)

Thank you again!

regards

Graham