May-22-2023, 07:09 AM
Hi all,
I want to be able to scrape different pages and categories within the same website. I have this code so far:
The output I get from this code is:
Thank you.
I want to be able to scrape different pages and categories within the same website. I have this code so far:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
from bs4 import BeautifulSoup import requests cats = [ 'romance_8' , 'childrens_11' ] page_number = 1 for cat in cats: while True : r = requests.get(url, headers = headers) soup = BeautifulSoup(r.text, 'html.parser' ) active_page = soup.find( 'li' , class_ = 'col-xs-6 col-sm-4 col-md-3 col-lg-3' ) pages = soup.find_all( 'li' , class_ = 'col-xs-6 col-sm-4 col-md-3 col-lg-3' ) if active_page is None : break for page in pages: price_color = page.find( 'p' , class_ = 'price_color' ).text.strip() print (url) page_number = page_number + 1 |
The output I get from this code is:
Output:https://books.toscrape.com/catalogue/category/books/romance_8/page-1.html
https://books.toscrape.com/catalogue/category/books/romance_8/page-2.html
This is partly correct, but I wanted the code to advance to the next item on the list "childrens_11" and get a list of URLs for this category. This coincidently has 2 pages also, so if the code was working correctly, it would show:Output:https://books.toscrape.com/catalogue/category/books/romance_8/page-1.html
https://books.toscrape.com/catalogue/category/books/romance_8/page-2.html
https://books.toscrape.com/catalogue/category/books/childrens_11/page-1.html
https://books.toscrape.com/catalogue/category/books/childrens_11/page-2.html
Could someone please enlighten me how to fix the code to enable this?Thank you.