Break out of nested loops

muzikman · Sep-16-2021, 05:53 PM

Actually, I refactored the code and it works better and is more readable. Beautiful Soup doesn't really give you a way to create an empty object then set the parameters via the object instance. So, I just did it this way instead. It's similar to your code.

Forget about the timing. Just testing the time between using a session and not.
Yes, I could have made an object alias 'as bs' Big Grin

. The entire point of this was to split everything up into different functions.

I am pretty good with comprehension but I just want to know what the

author for author

does exactly. Is the first occurrence holding a value?

import requests
import bs4
import datetime


HEADERS = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Accept-Language': 'en-US,en;q=0.9',
    'Referer': 'https://google.com',
    'DNT': '1',

}

BASE_URL = 'https://quotes.toscrape.com/page/{}/'

def get_html(BASE_URL, current_page, ses):
    #Get request
    res = ses.get(BASE_URL.format(current_page), headers=HEADERS)
    return res

def get_soup(soup, set_authors):
    
    # Search for all of the authors
    for name in soup.select('.author'):
        # add each author's link text to a set to remove duplicates.
        set_authors.add(name.text)
    #return and append to global set_authors object    
    return set_authors  

def save_csv(set_authors):
    #Sort list alphabetically
    list_sort = list(set_authors)
    list_sort.sort()

    #save to CSV Code
    for author in list_sort:
        print(author)
      
 # no session 0:00:02.776860
 # with session 0:00:00.997129
 
    
def parse():
    
    #Global session object
    ses = requests.Session()
    set_authors = set()

    current_page = 1
    start = datetime.datetime.now()
    while True:

        res = get_html(BASE_URL, current_page, ses)
       
        if res.status_code == 200:
            
            soup = bs4.BeautifulSoup(res.text, 'lxml')
            set_authors = get_soup(soup, set_authors)
           
            if not(soup.select_one('li.next')):
                break
            else:
                current_page += 1

        else:
            print('error')
            break
    finish = start = datetime.datetime.now() - start
    print(finish)
    save_csv(set_authors)


parse()

if __name__ == '__main__':
    parse()

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	for loops break when I call the list I'm looping through	Radical	4	1,996	Sep-18-2023, 07:52 AM Last Post: buran
	reduce nested for-loops	Phaze90	11	5,205	Mar-16-2023, 06:28 PM Last Post: ndc85430
	Nested for loops: Iterating over columns of a DataFrame to plot on subplots	dm222	0	2,977	Aug-19-2022, 11:07 AM Last Post: dm222
	Nested for loops - help with iterating a variable outside of the main loop	dm222	4	2,987	Aug-17-2022, 10:17 PM Last Post: deanhystad
	breaking out of nested loops	Skaperen	3	2,021	Jul-18-2022, 12:59 AM Last Post: Skaperen
	How to break out of nested loops	pace	11	7,741	Mar-03-2021, 06:25 PM Last Post: pace
	Nested for Loops	sammay	1	12,769	Jan-09-2021, 06:48 PM Last Post: deanhystad
	How to make this function general to create binary numbers? (many nested for loops)	dospina	4	6,329	Jun-24-2020, 04:05 AM Last Post: deanhystad
	Conditionals, while loops, continue, break (PyBite 102)	Drone4four	2	4,138	Jun-04-2020, 12:08 PM Last Post: Drone4four
	Python beginner - nested while loops	mikebarden	1	2,520	Jun-01-2020, 01:04 PM Last Post: DPaul

Break out of nested loops

User Panel Messages

Announcements