Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Break out of nested loops
#4
Actually, I refactored the code and it works better and is more readable. Beautiful Soup doesn't really give you a way to create an empty object then set the parameters via the object instance. So, I just did it this way instead. It's similar to your code.

Forget about the timing. Just testing the time between using a session and not.
Yes, I could have made an object alias 'as bs' Big Grin . The entire point of this was to split everything up into different functions.

I am pretty good with comprehension but I just want to know what the
author for author
does exactly. Is the first occurrence holding a value?


import requests
import bs4
import datetime


HEADERS = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Accept-Language': 'en-US,en;q=0.9',
    'Referer': 'https://google.com',
    'DNT': '1',

}

BASE_URL = 'https://quotes.toscrape.com/page/{}/'

def get_html(BASE_URL, current_page, ses):
    #Get request
    res = ses.get(BASE_URL.format(current_page), headers=HEADERS)
    return res

def get_soup(soup, set_authors):
    
    # Search for all of the authors
    for name in soup.select('.author'):
        # add each author's link text to a set to remove duplicates.
        set_authors.add(name.text)
    #return and append to global set_authors object    
    return set_authors  

def save_csv(set_authors):
    #Sort list alphabetically
    list_sort = list(set_authors)
    list_sort.sort()

    #save to CSV Code
    for author in list_sort:
        print(author)
      
 # no session 0:00:02.776860
 # with session 0:00:00.997129
 
    
def parse():
    
    #Global session object
    ses = requests.Session()
    set_authors = set()

    current_page = 1
    start = datetime.datetime.now()
    while True:

        res = get_html(BASE_URL, current_page, ses)
       
        if res.status_code == 200:
            
            soup = bs4.BeautifulSoup(res.text, 'lxml')
            set_authors = get_soup(soup, set_authors)
           
            if not(soup.select_one('li.next')):
                break
            else:
                current_page += 1

        else:
            print('error')
            break
    finish = start = datetime.datetime.now() - start
    print(finish)
    save_csv(set_authors)


parse()

if __name__ == '__main__':
    parse()
Reply


Messages In This Thread
Break out of nested loops - by muzikman - Sep-15-2021, 06:02 PM
RE: Break out of nested loops - by muzikman - Sep-16-2021, 02:34 PM
RE: Break out of nested loops - by deanhystad - Sep-16-2021, 03:37 PM
RE: Break out of nested loops - by muzikman - Sep-16-2021, 05:53 PM
RE: Break out of nested loops - by muzikman - Sep-16-2021, 06:00 PM
RE: Break out of nested loops - by deanhystad - Sep-16-2021, 07:31 PM
RE: Break out of nested loops - by muzikman - Sep-17-2021, 03:06 PM
RE: Break out of nested loops - by muzikman - Sep-17-2021, 04:18 PM
RE: Break out of nested loops - by deanhystad - Sep-17-2021, 05:34 PM
RE: Break out of nested loops - by muzikman - Sep-17-2021, 05:48 PM
RE: Break out of nested loops - by deanhystad - Sep-17-2021, 07:31 PM
RE: Break out of nested loops - by muzikman - Sep-18-2021, 12:59 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  for loops break when I call the list I'm looping through Radical 4 922 Sep-18-2023, 07:52 AM
Last Post: buran
  reduce nested for-loops Phaze90 11 1,949 Mar-16-2023, 06:28 PM
Last Post: ndc85430
  Nested for loops: Iterating over columns of a DataFrame to plot on subplots dm222 0 1,739 Aug-19-2022, 11:07 AM
Last Post: dm222
  Nested for loops - help with iterating a variable outside of the main loop dm222 4 1,621 Aug-17-2022, 10:17 PM
Last Post: deanhystad
  breaking out of nested loops Skaperen 3 1,252 Jul-18-2022, 12:59 AM
Last Post: Skaperen
  How to break out of nested loops pace 11 5,438 Mar-03-2021, 06:25 PM
Last Post: pace
  Nested for Loops sammay 1 9,018 Jan-09-2021, 06:48 PM
Last Post: deanhystad
  How to make this function general to create binary numbers? (many nested for loops) dospina 4 4,481 Jun-24-2020, 04:05 AM
Last Post: deanhystad
  Conditionals, while loops, continue, break (PyBite 102) Drone4four 2 2,999 Jun-04-2020, 12:08 PM
Last Post: Drone4four
  Python beginner - nested while loops mikebarden 1 1,884 Jun-01-2020, 01:04 PM
Last Post: DPaul

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020