May-22-2023, 07:09 AM
Hi all,
I want to be able to scrape different pages and categories within the same website. I have this code so far:
The output I get from this code is:
Thank you.
I want to be able to scrape different pages and categories within the same website. I have this code so far:
from bs4 import BeautifulSoup import requests cats =['romance_8', 'childrens_11'] page_number = 1 for cat in cats: while True: url = f'https://books.toscrape.com/catalogue/category/books/{cat}/page-{page_number}.html' r = requests.get(url, headers=headers) soup = BeautifulSoup(r.text, 'html.parser') active_page = soup.find('li', class_='col-xs-6 col-sm-4 col-md-3 col-lg-3') pages = soup.find_all('li', class_='col-xs-6 col-sm-4 col-md-3 col-lg-3') if active_page is None: break for page in pages: price_color = page.find('p', class_ = 'price_color').text.strip() print(url) page_number = page_number + 1
The output I get from this code is:
Output:https://books.toscrape.com/catalogue/category/books/romance_8/page-1.html
https://books.toscrape.com/catalogue/category/books/romance_8/page-2.html
This is partly correct, but I wanted the code to advance to the next item on the list "childrens_11" and get a list of URLs for this category. This coincidently has 2 pages also, so if the code was working correctly, it would show:Output:https://books.toscrape.com/catalogue/category/books/romance_8/page-1.html
https://books.toscrape.com/catalogue/category/books/romance_8/page-2.html
https://books.toscrape.com/catalogue/category/books/childrens_11/page-1.html
https://books.toscrape.com/catalogue/category/books/childrens_11/page-2.html
Could someone please enlighten me how to fix the code to enable this?Thank you.