![]() |
Advancing Through Variables In A List - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Advancing Through Variables In A List (/thread-39964.html) |
Advancing Through Variables In A List - knight2000 - May-13-2023 Hi guys, I'm practising some simple webscraping on a website which has different categories and within each category, has a number of pages. What I'm trying to do is have a list of categories and after the code get's to the end of a categories set of pages, to then start the next category and start scraping from that categories page 1. The code I have written works in getting to the end of a category, but if the category has say 3 pages, when it moves to the next category, it starts at page 3- where I need it to obviously start back at page 1. My code is: from bs4 import BeautifulSoup import requests import pandas as pd import re from time import sleep from random2 import randint cats = ['bessey', 'kreg'] headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'} product_url = [] product_name = [] start_page_number = 1 data = {'Product': product_name, 'Product URL': product_url} for cat in cats: while True: url = f'https://www.example.com/brands/{cat}?PageProduct={start_page_number}&PageSizeProduct=48' print(url) r = requests.get(url, headers=headers) soup = BeautifulSoup(r.text, 'html.parser') section = soup.find('section', class_='layout-maincontent product-list-grid-template') product_grid = section.find('div', id='product-grid') more_pages = product_grid.find('div', class_ = 'product') if more_pages is None: break products = product_grid.find_all('div', class_='product') # print(url) for product in products: # Find Product Direct URL Link productlisttitle = product.find('div', class_='cv-zone-product-4') urlref = productlisttitle.find('a')['href'] urllink = 'https://www.example.com' + urlref product_url.append(urllink) start_page_number = 1Could someone please help me in showing me how to reiterate through the list of categories, but start at page 1 for each category? Thanking you. |