Python Forum
Advancing Through Variables In A List
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Advancing Through Variables In A List
#1
Hi guys,

I'm practising some simple webscraping on a website which has different categories and within each category, has a number of pages. What I'm trying to do is have a list of categories and after the code get's to the end of a categories set of pages, to then start the next category and start scraping from that categories page 1.

The code I have written works in getting to the end of a category, but if the category has say 3 pages, when it moves to the next category, it starts at page 3- where I need it to obviously start back at page 1.

My code is:

from bs4 import BeautifulSoup
import requests
import pandas as pd
import re
from time import sleep
from random2 import randint



cats = ['bessey', 'kreg']

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}

product_url = []
product_name = []


start_page_number = 1

data = {'Product': product_name, 'Product URL': product_url}

for cat in cats:
    

    while True:
        url = f'https://www.example.com/brands/{cat}?PageProduct={start_page_number}&PageSizeProduct=48'
        print(url)
        r = requests.get(url, headers=headers)
        soup = BeautifulSoup(r.text, 'html.parser')

        section = soup.find('section', class_='layout-maincontent product-list-grid-template')
        product_grid = section.find('div', id='product-grid')
        more_pages = product_grid.find('div', class_ = 'product')
        if more_pages is None:
            break
            
        products = product_grid.find_all('div', class_='product')
        # print(url)

        for product in products:
            # Find Product Direct URL Link
            productlisttitle = product.find('div', class_='cv-zone-product-4')
            urlref = productlisttitle.find('a')['href']
            urllink = 'https://www.example.com' + urlref
            product_url.append(urllink)
		
		start_page_number = 1
Could someone please help me in showing me how to reiterate through the list of categories, but start at page 1 for each category?

Thanking you.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Advancing Page Numbers knight2000 4 1,007 May-24-2023, 09:14 AM
Last Post: knight2000
  Converting list to variables Palves 1 1,774 Sep-18-2020, 05:43 PM
Last Post: stullis
  Print variable values from a list of variables xnightwingx 3 2,653 Sep-01-2020, 02:56 PM
Last Post: deanhystad
  Using a list of variables Zane217 6 2,539 Jun-09-2020, 01:37 PM
Last Post: Yoriz
  add all variables to a list faszination_92 6 3,138 Apr-14-2020, 04:36 AM
Last Post: buran
  Creating a List with many variables in a simple way donnertrud 1 2,044 Jan-11-2020, 03:00 PM
Last Post: Clunk_Head
  2D Array/List OR using variables in other variable names? IAMK 4 3,881 Apr-16-2018, 09:09 PM
Last Post: IAMK
  list vs variables mcmxl22 2 3,168 Jan-27-2018, 10:00 AM
Last Post: Gribouillis
  list of user's variables in the interpreter nzcan 5 3,925 Jan-21-2018, 11:02 AM
Last Post: nzcan

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020