Python Forum
Advancing Through Variables In A List - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Advancing Through Variables In A List (/thread-39964.html)



Advancing Through Variables In A List - knight2000 - May-13-2023

Hi guys,

I'm practising some simple webscraping on a website which has different categories and within each category, has a number of pages. What I'm trying to do is have a list of categories and after the code get's to the end of a categories set of pages, to then start the next category and start scraping from that categories page 1.

The code I have written works in getting to the end of a category, but if the category has say 3 pages, when it moves to the next category, it starts at page 3- where I need it to obviously start back at page 1.

My code is:

from bs4 import BeautifulSoup
import requests
import pandas as pd
import re
from time import sleep
from random2 import randint



cats = ['bessey', 'kreg']

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}

product_url = []
product_name = []


start_page_number = 1

data = {'Product': product_name, 'Product URL': product_url}

for cat in cats:
    

    while True:
        url = f'https://www.example.com/brands/{cat}?PageProduct={start_page_number}&PageSizeProduct=48'
        print(url)
        r = requests.get(url, headers=headers)
        soup = BeautifulSoup(r.text, 'html.parser')

        section = soup.find('section', class_='layout-maincontent product-list-grid-template')
        product_grid = section.find('div', id='product-grid')
        more_pages = product_grid.find('div', class_ = 'product')
        if more_pages is None:
            break
            
        products = product_grid.find_all('div', class_='product')
        # print(url)

        for product in products:
            # Find Product Direct URL Link
            productlisttitle = product.find('div', class_='cv-zone-product-4')
            urlref = productlisttitle.find('a')['href']
            urllink = 'https://www.example.com' + urlref
            product_url.append(urllink)
		
		start_page_number = 1
Could someone please help me in showing me how to reiterate through the list of categories, but start at page 1 for each category?

Thanking you.