Python Forum
Crawl an online store
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Crawl an online store
#1
Hello,

I am a begginner with scrapy framework. I have to scrap an online pharmacy store and to crawl only 3 products from each category. Some of categories have subcategories. Is there any material to read based on these?

Thank you in advance!
Reply
#2
If the page has javascipt it would be easier to use selenium instead. Otherwise the docs on scrapy should get you started.
Recommended Tutorials:
Reply
#3
I tried something and here is my code:
import scrapy

class QuoteSpider(scrapy.Spider):

    name = 'quotes'
    
    start_urls = {
        'https://www.pharmastore.gr/'
    }

    def parse(self, response):
        urls = response.css('li.nav-item > a ::attr(href)').extract()
        yield {'xxxxxxxxxxxxxxxxxxxxxxxxxxxx':urls}
        for url in urls:
            url = response.urljoin(url)
            yield{'firstttttttttttt': url}
            yield scrapy.Request(url=url, callback=self.parse_details)
            urls1 = response.css('li.column-span1 > h2 > a ::attr(href)').extract()
            yield {'urllllllllllllllllllllllllllllll': urls1}
            for url1 in urls1:
                url1 = response.urljoin(url1)
                yield{"hellooooooooooooooooooooooooooooooo"}
                yield scrapy.Request(url1=url1, callback=self.parse_details1)

    def parse_details(self, response):
         pass

    def parse_details1(self, response):
         pass
And here is my output:
Output:
(base) C:\Users\tsoum\PycharmProjects\scrapy\tutorial>scrapy crawl quotes 2020-01-16 20:20:50 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: tutorial) 2020-01-16 20:20:50 [scrapy.utils.log] INFO: Versions: lxml 4.4.1.0, libxml2 2.9.9, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.10.0, Python 3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1d 10 Sep 2019), cryptography 2.7, Platform Windows-10-10.0.18362-SP0 2020-01-16 20:20:50 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'tutorial', 'NEWSPIDER_MODULE': 'tutorial.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['tutorial.spiders']} 2020-01-16 20:20:50 [scrapy.extensions.telnet] INFO: Telnet Password: 33c6641b9607929e 2020-01-16 20:20:50 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] 2020-01-16 20:20:51 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2020-01-16 20:20:51 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2020-01-16 20:20:51 [scrapy.middleware] INFO: Enabled item pipelines: [] 2020-01-16 20:20:51 [scrapy.core.engine] INFO: Spider opened 2020-01-16 20:20:51 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2020-01-16 20:20:51 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2020-01-16 20:20:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pharmastore.gr/robots.txt> (referer: None) 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 20 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 35 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 45 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 2219 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 2220 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 2221 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 2226 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4446 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4447 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4450 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4451 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4454 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4455 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4458 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4459 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4462 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4463 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4522 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4555 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4588 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4621 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4654 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4681 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4714 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4741 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4742 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4774 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4801 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4834 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4867 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4905 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4938 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4965 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 4992 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5019 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5046 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5074 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5084 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5094 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5104 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5114 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5124 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5134 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5144 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5154 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5164 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5174 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5184 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5194 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5204 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5214 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5224 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5234 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5244 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5254 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5264 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5274 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5352 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5353 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5386 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5387 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5388 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5389 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5390 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5429 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5431 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5432 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5445 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5449 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5452 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5455 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5458 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5461 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5464 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5470 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5472 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5489 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5503 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5520 without any user agent to enforce it on. 2020-01-16 20:20:52 [protego] DEBUG: Rule at line 5532 without any user agent to enforce it on. 2020-01-16 20:20:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pharmastore.gr/> (referer: None) 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'xxxxxxxxxxxxxxxxxxxxxxxxxxxx': ['categories.asp?catid=433&title=iatrika-', 'categories.asp?catid=4019&title=orthopedika-', 'categories.asp?catid=4477&title=sympliromata-', 'categories.asp?catid=8390&title=epo xiaka-', 'categories.asp?catid=8250&title=andras-', 'categories.asp?catid=8087&title=gynaika-', 'categories.asp?catid=7963&title=mamapaidi-', 'categories.asp?catid=8249&title=kathimerini-frontida-', 'categories .asp?catid=3293&title=athlitika-', 'categories.asp?catid=9892&title=farmakeio-', 'categories.asp?catid=433&title=iatrika-', 'categories.asp?catid=4019&title=orthopedika-', 'categories.asp?catid=4477&title=sympl iromata-', 'categories.asp?catid=8390&title=epoxiaka-', 'categories.asp?catid=8250&title=andras-', 'categories.asp?catid=8087&title=gynaika-', 'categories.asp?catid=7963&title=mamapaidi-', 'categories.asp?catid =8249&title=kathimerini-frontida-', 'categories.asp?catid=3293&title=athlitika-', 'categories.asp?catid=9892&title=farmakeio-']} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'firstttttttttttt': 'https://www.pharmastore.gr/categories.asp?catid=433&title=iatrika-'} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'urllllllllllllllllllllllllllllll': []} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'firstttttttttttt': 'https://www.pharmastore.gr/categories.asp?catid=4019&title=orthopedika-'} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'urllllllllllllllllllllllllllllll': []} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'firstttttttttttt': 'https://www.pharmastore.gr/categories.asp?catid=4477&title=sympliromata-'} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'urllllllllllllllllllllllllllllll': []} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'firstttttttttttt': 'https://www.pharmastore.gr/categories.asp?catid=8390&title=epoxiaka-'} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'urllllllllllllllllllllllllllllll': []} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'firstttttttttttt': 'https://www.pharmastore.gr/categories.asp?catid=8250&title=andras-'} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'urllllllllllllllllllllllllllllll': []} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'firstttttttttttt': 'https://www.pharmastore.gr/categories.asp?catid=8087&title=gynaika-'} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'urllllllllllllllllllllllllllllll': []} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'firstttttttttttt': 'https://www.pharmastore.gr/categories.asp?catid=7963&title=mamapaidi-'} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'urllllllllllllllllllllllllllllll': []} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'firstttttttttttt': 'https://www.pharmastore.gr/categories.asp?catid=8249&title=kathimerini-frontida-'} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'urllllllllllllllllllllllllllllll': []} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'firstttttttttttt': 'https://www.pharmastore.gr/categories.asp?catid=3293&title=athlitika-'} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'urllllllllllllllllllllllllllllll': []} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'firstttttttttttt': 'https://www.pharmastore.gr/categories.asp?catid=9892&title=farmakeio-'} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'urllllllllllllllllllllllllllllll': []} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'firstttttttttttt': 'https://www.pharmastore.gr/categories.asp?catid=433&title=iatrika-'} 2020-01-16 20:20:55 [scrapy.dupefilters] DEBUG: Filtered duplicate request: <GET https://www.pharmastore.gr/categories.asp?catid=433&title=iatrika-> - no more duplicates will be shown (see DUPEFILTER_DEBUG to s how all duplicates) 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'urllllllllllllllllllllllllllllll': []} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'firstttttttttttt': 'https://www.pharmastore.gr/categories.asp?catid=4019&title=orthopedika-'} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'urllllllllllllllllllllllllllllll': []} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'firstttttttttttt': 'https://www.pharmastore.gr/categories.asp?catid=4477&title=sympliromata-'} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'urllllllllllllllllllllllllllllll': []} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'firstttttttttttt': 'https://www.pharmastore.gr/categories.asp?catid=8390&title=epoxiaka-'} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'urllllllllllllllllllllllllllllll': []} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'firstttttttttttt': 'https://www.pharmastore.gr/categories.asp?catid=8250&title=andras-'} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'urllllllllllllllllllllllllllllll': []} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'firstttttttttttt': 'https://www.pharmastore.gr/categories.asp?catid=8087&title=gynaika-'} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'urllllllllllllllllllllllllllllll': []} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'firstttttttttttt': 'https://www.pharmastore.gr/categories.asp?catid=7963&title=mamapaidi-'} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'urllllllllllllllllllllllllllllll': []} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'firstttttttttttt': 'https://www.pharmastore.gr/categories.asp?catid=8249&title=kathimerini-frontida-'} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'urllllllllllllllllllllllllllllll': []} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'firstttttttttttt': 'https://www.pharmastore.gr/categories.asp?catid=3293&title=athlitika-'} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'urllllllllllllllllllllllllllllll': []} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'firstttttttttttt': 'https://www.pharmastore.gr/categories.asp?catid=9892&title=farmakeio-'} 2020-01-16 20:20:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pharmastore.gr/> {'urllllllllllllllllllllllllllllll': []} 2020-01-16 20:20:55 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.pharmastore.gr/categories/8087/gynaika-> from <GET https://www.pharmastore.gr/categories.asp?catid=8087&t itle=gynaika-> 2020-01-16 20:20:56 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.pharmastore.gr/categories/8249/kathimerini-frontida-> from <GET https://www.pharmastore.gr/categories.asp ?catid=8249&title=kathimerini-frontida-> 2020-01-16 20:20:56 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.pharmastore.gr/categories/8250/andras-> from <GET https://www.pharmastore.gr/categories.asp?catid=8250&ti tle=andras-> 2020-01-16 20:20:56 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.pharmastore.gr/categories/7963/mamapaidi-> from <GET https://www.pharmastore.gr/categories.asp?catid=7963 &title=mamapaidi-> 2020-01-16 20:20:56 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.pharmastore.gr/categories/8390/epoxiaka-> from <GET https://www.pharmastore.gr/categories.asp?catid=8390& title=epoxiaka-> 2020-01-16 20:20:56 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.pharmastore.gr/categories/4477/sympliromata-> from <GET https://www.pharmastore.gr/categories.asp?catid=4 477&title=sympliromata-> 2020-01-16 20:20:56 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.pharmastore.gr/categories/4019/orthopedika-> from <GET https://www.pharmastore.gr/categories.asp?catid=40 19&title=orthopedika-> 2020-01-16 20:20:56 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.pharmastore.gr/categories/433/iatrika-> from <GET https://www.pharmastore.gr/categories.asp?catid=433&tit le=iatrika-> 2020-01-16 20:20:56 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.pharmastore.gr/categories/9892/farmakeio-> from <GET https://www.pharmastore.gr/categories.asp?catid=9892 &title=farmakeio-> 2020-01-16 20:20:56 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.pharmastore.gr/categories/3293/athlitika-> from <GET https://www.pharmastore.gr/categories.asp?catid=3293 &title=athlitika-> 2020-01-16 20:20:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pharmastore.gr/categories/8087/gynaika-> (referer: https://www.pharmastore.gr/) 2020-01-16 20:20:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pharmastore.gr/categories/8249/kathimerini-frontida-> (referer: https://www.pharmastore.gr/) 2020-01-16 20:20:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pharmastore.gr/categories/7963/mamapaidi-> (referer: https://www.pharmastore.gr/) 2020-01-16 20:21:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pharmastore.gr/categories/8250/andras-> (referer: https://www.pharmastore.gr/) 2020-01-16 20:21:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pharmastore.gr/categories/4477/sympliromata-> (referer: https://www.pharmastore.gr/) 2020-01-16 20:21:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pharmastore.gr/categories/8390/epoxiaka-> (referer: https://www.pharmastore.gr/) 2020-01-16 20:21:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pharmastore.gr/categories/4019/orthopedika-> (referer: https://www.pharmastore.gr/) 2020-01-16 20:21:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pharmastore.gr/categories/433/iatrika-> (referer: https://www.pharmastore.gr/) 2020-01-16 20:21:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pharmastore.gr/categories/9892/farmakeio-> (referer: https://www.pharmastore.gr/) 2020-01-16 20:21:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pharmastore.gr/categories/3293/athlitika-> (referer: https://www.pharmastore.gr/) 2020-01-16 20:21:03 [scrapy.core.engine] INFO: Closing spider (finished) 2020-01-16 20:21:03 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 8002, 'downloader/request_count': 22, 'downloader/request_method_count/GET': 22, 'downloader/response_bytes': 2295414, 'downloader/response_count': 22, 'downloader/response_status_count/200': 12, 'downloader/response_status_count/301': 10, 'dupefilter/filtered': 10, 'elapsed_time_seconds': 12.199686, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2020, 1, 16, 18, 21, 3, 501108), 'item_scraped_count': 41, 'log_count/DEBUG': 144, 'log_count/INFO': 10, 'request_depth_max': 1, 'response_received_count': 12, 'robotstxt/request_count': 1, 'robotstxt/response_count': 1, 'robotstxt/response_status_count/200': 1, 'scheduler/dequeued': 21, 'scheduler/dequeued/memory': 21, 'scheduler/enqueued': 21, 'scheduler/enqueued/memory': 21, 'start_time': datetime.datetime(2020, 1, 16, 18, 20, 51, 301422)} 2020-01-16 20:21:03 [scrapy.core.engine] INFO: Spider closed (finished)
As I mentioned previously, I have to crawl 3 random products from last level category and I have trouble how to crawl through links. In the output, it seems that I have find the first category and after returns an empty list(urllllllllllll=[])
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020