Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Cannot extract data from the next pages
Dear Members,

I am writing Python codes to extract EyeGlass listings at ''. The codes perfectly extract data from the first page and but fails to extract data from the next pages. There are in total 5 pages. I list both VS codes and Terminal report here. I highly appreciate your help.

# -*- coding: utf-8 -*-
import scrapy

class GlassSpider(scrapy.Spider):
    name = 'glass'
    allowed_domains = ['']
    start_urls = ['']

    def parse(self, response):
        names=response.xpath("//p[@class='pname col-sm-12']/a")
        for name in names:

            yield response.follow(url=link, callback=self.parse_glass, meta={'glass_name': name_var})

    def parse_glass(self, response):
        sku=response.xpath("//ul[@class='col-12 col-sm-6 default-content']/li[1]/text()").get()

            'glass_name': name_var,
            'price': price,
            'sku': sku,
            'frame': frame
        next_page = response.xpath("(//div[@class='custom-pagination']/ul/li)[7]/a/@href").get()
        if next_page:
            yield scrapy.Request(url=next_page, callback=self.parse)
Terminal Report:
change start_urls to include all 5 pages

start_urls = [f'{page}' for page in range(1, 6)]
Thank you, buran, for your response. It works perfectly fine now. If you do not mind, could you please briefly explain the problem in the code. I believe I will learn from your explanation and in the future solve this sort of problem.
well, I don't know what's there to explain. You have 2 levels of pages - the top 5 pages is the first level. When you parse these 5 pages you have all the urls of each individual product. The second levels is the each individual product page.
Your start_urls had only one of the 5 top level urls.

as explained in the docs, start_urls list is shortcut for start_requests method

def start_requests(self):
    for page in range(1, 6):
        yield scrapy.Request(url=f'{page}', callback=self.parse)
The explanation completely makes sense. Thank you, buran.

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Extract data from a webpage cycloneseb 4 173 Nov-12-2019, 05:25 PM
Last Post: snippsat
  Extracting Headers from Many Pages Quickly OstermanA 2 237 Oct-01-2019, 08:01 AM
Last Post: OstermanA
  pagination for non standarded pages zarize 12 456 Sep-02-2019, 12:35 PM
Last Post: zarize
  How to use Python to extract data from Zoho Creator software on the web dan7055 2 531 Jul-05-2019, 05:11 PM
Last Post: DeaD_EyE
  Python/BeautiifulSoup. list of urls ->parse->extract data to csv. getting ERROR IanTheLMT 2 420 Jul-04-2019, 02:31 AM
Last Post: IanTheLMT
  Help to extract data from web prasadmathe 4 477 May-20-2019, 10:59 PM
Last Post: michalmonday
  Protected Pages with Django xxp2 2 389 Feb-12-2019, 07:28 PM
Last Post: xxp2
  [Python 3] - Extract specific data from a web page using lxml module Takeshio 9 1,604 Aug-25-2018, 08:46 AM
Last Post: leotrubach
  Scraping external URLs from pages Apook 5 1,272 Jul-18-2018, 06:42 PM
Last Post: nilamo
  scraping multiple pages of a website. Blue Dog 14 11,169 Jun-21-2018, 09:03 PM
Last Post: Blue Dog

Forum Jump:

Users browsing this thread: 1 Guest(s)