Sep-17-2021, 07:31 PM
(This post was last modified: Sep-17-2021, 09:02 PM by deanhystad.)
Why write this?
Now you can write your code like this:
def get_html(BASE_URL, current_page, base_session): #Get request response = base_session.get(BASE_URL.format(current_page), headers=HEADERS) return responseWhat benefit is derived by writing this function that just calls another function? Does it make the code easier to understand? Does it introduce an important abstraction? I don't think it does either, and adding a function just adds code. How about doing this instead:
def get_soup(session, page): '''Return steamy bowl of soup for BASE_URL page. Return None if request fails''' session = session.get(BASE_URL.format(page), headers=HEADERS) if session.status_code == 200: return BeautifulSoup(session.text, 'lxml') return NoneThis barely meets my minimum requirements for making a function but at least it does something interesting. It is a soup factory. Give it a URL and a page and it returns soup. Notice I removed the URL argument. I assume the URL and the HEADERS argument are closely tied. If so either pass both as arguments or pass neither. If you pass the URL as an argument, use lower case. Save all upper case for global variables.
Now you can write your code like this:
def get_authors(): # Use meaningful function names, not Parse '''Loop through BASE_URL pages collecting author tags. Return list of authors''' authors = [] page = 1 session = requests.Session() while True: soup = get_soup(session, page) if soup is None: print("I'm hungry") # Should probably raise an exception break for name in soup.select('.author'): authors.append(name.text) if not(soup.select_one('li.next')): break page += 1 return authors if __name__ == '__main__': save_csv(get_authors(), 'example.csv')You still get to have functions, but now each does something useful and has a purpose you can describe using one sentence. get_soup() retruns soup for a page, save_csv() will eventually save a list to a csv file, and get_authors returns a list of authors from a URL. If you parameterized the URL, HEADER and tag these might all be reusable.