Break out of nested loops

muzikman · Sep-17-2021, 05:48 PM

I cleaned it up a little based on some of your recommendations and my own:

This was not necessary because I had a break before it:

if not(soup.select_one('li.next')):
            break
else:
      current_page += 1

I also returned a list instead of a set from get_content function

The

get_content

function returns a list. Then I decided to return it as a sorted set. I was not aware of the '

sorted

' keyword.

import requests
import bs4 as bs
import csv



HEADERS = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Accept-Language': 'en-US,en;q=0.9',
    'Referer': 'https://google.com',
    'DNT': '1',

}

BASE_URL = 'https://quotes.toscrape.com/page/{}/'

def get_html(BASE_URL, current_page, base_session):
    #Get request
    response = base_session.get(BASE_URL.format(current_page), headers=HEADERS)
    return response

def get_soup(soup, list_authors, selector):
    
    # Search for all of the authors
    for name in soup.select(selector):
        
        list_authors.append(name.text)
        
    return list_authors 

def save_csv(list_authors, filename):
    #Sort list alphabetically
    list_sorted = sorted(set(list_authors))
    

    #save to CSV Code
    # with open(filename, 'w', encoding='utf-8', newline='') as csvfile:
    #     writer = csv.writer(csvfile, delimiter=',')
    #     writer.writerow(['Author'])
    for author in list_sorted:
            # writer.writerow([author])
        print(author)
   
def parse():
    
    #Global session object
    base_session = requests.Session()
    list_authors = []

    current_page = 1
  
    while True:

        page_session = get_html(BASE_URL, current_page, base_session)
       
        if page_session.status_code != 200:
            print('error')
            break
            
        
        soup = bs.BeautifulSoup(page_session.text, 'lxml')
        list_result = get_soup(soup, list_authors, '.author')
           
        if not(soup.select_one('li.next')): # I had a correspond else statement else: current_page += 1 that was redundant. If we don't break
                                                        # it's safe to iterate current page     
            break
                           
        current_page += 1
  
    save_csv(list_result, 'example.csv')


if __name__ == '__main__':
    parse()

Your way works great but I wanted to break it up into functions even if it didn't warrant it. Thanks for all of your help. You made me think of a more abstract approach.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	for loops break when I call the list I'm looping through	Radical	4	937	Sep-18-2023, 07:52 AM Last Post: buran
	reduce nested for-loops	Phaze90	11	1,975	Mar-16-2023, 06:28 PM Last Post: ndc85430
	Nested for loops: Iterating over columns of a DataFrame to plot on subplots	dm222	0	1,752	Aug-19-2022, 11:07 AM Last Post: dm222
	Nested for loops - help with iterating a variable outside of the main loop	dm222	4	1,641	Aug-17-2022, 10:17 PM Last Post: deanhystad
	breaking out of nested loops	Skaperen	3	1,263	Jul-18-2022, 12:59 AM Last Post: Skaperen
	How to break out of nested loops	pace	11	5,452	Mar-03-2021, 06:25 PM Last Post: pace
	Nested for Loops	sammay	1	9,237	Jan-09-2021, 06:48 PM Last Post: deanhystad
	How to make this function general to create binary numbers? (many nested for loops)	dospina	4	4,490	Jun-24-2020, 04:05 AM Last Post: deanhystad
	Conditionals, while loops, continue, break (PyBite 102)	Drone4four	2	3,007	Jun-04-2020, 12:08 PM Last Post: Drone4four
	Python beginner - nested while loops	mikebarden	1	1,892	Jun-01-2020, 01:04 PM Last Post: DPaul

Break out of nested loops

User Panel Messages

Announcements