Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Python scrapy scraped_items
#1
i was testing the following code to see the results and on debugging i saw the scraped_items and it was like 4.777,....that wasnt the results i wanted to get....second i wanted to scrape each def function to different file and finnaly to scrape all the functions...and not only the first and second functions..... :(

Thank you very much!!! :D

here's is my actual code:
# -*- coding: utf-8 -*-
import scrapy


class SccbotSpider(scrapy.Spider):
    name = 'SccBot'
    start_urls = ['https://spurverbreiterung.de/index.php?cat=c182_Radbefestigungsteile.html']

    def parse(self, response):
        tab1 = response.css('#tab1')
        for container in tab1.css('tr > td[align="center"]'):
            scraped_info = {
            'TextBox' : container.css('a::text').extract(),
            'LinkBox' : container.css('a::attr(href)').extract(),
            'CurrentUrl' : response.url
            }
            yield scraped_info

        urls = tab1.css('tr > td[align="center"] > a::attr(href)').extract()
        for url in urls:
            url = response.urljoin(url)
            yield scrapy.Request(url=url, callback=self.parse_details)

    def parse_details(self, response):
        for containerx in response.css('tr > td[align="center"]'):
            scraped_items = {
            'TextBox' : containerx.css('a::text').extract(),
            'LinkBox' : containerx.css('a::attr(href)').extract(),
            'CurrentUrl' : response.url
            }
            yield scraped_items

        urls = response.css('tr > td[align="center"] > a::attr(href)').extract()
        for url in urls:
            url = response.urljoin(url)
            yield scrapy.Request(url=url, callback=self.parse_items)


    def parse_items(self, response):
        for products in response.css('div.inhalt > a.product_link'):
            scraped_products = {
            'Category' : response.css('#main_content > h1::text').extract(),
            'CategoryType' : response.css('div.content_boxes > div.rad_header::text').extract(),
            'ProductName' : products.css('div.prod-name::text').extract(),
            'ProductNumber' : products.css('div.art-nr > span::text').extract(),
            'Price' : products.css('div.preis').extract(),
            'AvaibilityIcon' : products.css('div.ampel > img::attr(src)').extract(),
            'ProductLink' : products.css('a.product_link::attr(href)').extract(),
            'CurrentURL' : response.url
            }
            yield scraped_products

        urls = response.css('div.inhalt > a.product_link::attr(href)').extract()
        for url in urls:
            url = response.urljoin(url)
            yield scrapy.Request(url=url, callback=self.parse_ims)

    def parse_ims(self, response):
        for productss in response.css('div.wrapper'):
            scraped_rads = {
            'Title' : productss.css('#product_info > h1::text').extract(),
            'Price' : productss.css('div.productsinfo_price > span::text').extract(),
            'ProductDetails' : productss.css('div.product_details.clear > table').extract(),
            'ProductInfo' : productss.css('div.productsinfo_right').extract(),
            'ProductImg' : productss.css('div.productsinfo_img > ul > img::attr(src)').extract(),
            'MoreDetails' : productss.css('div.textf_rechts').extract(),
            'CurrentURL' : response.url,
            }
            yield scraped_rads
Quote
#2
(Nov-12-2018, 12:39 PM)Baggelhsk95 Wrote: i saw the scraped_items and it was like 4.777,....that wasnt the results i wanted to get
What was the result you wanted to get?
(Nov-12-2018, 12:39 PM)Baggelhsk95 Wrote: second i wanted to scrape each def function to different file
That's slightly complicated, and will require a custom exporter/pipeline.
(Nov-12-2018, 12:39 PM)Baggelhsk95 Wrote: finnaly to scrape all the functions...and not only the first and second functions
If the third and forth callbacks are not being called, something is going wrong in the second one.
One possibility is that no urls are being found, another is that you're creating duplicate requests.
Quote
#3
Quote:If the third and forth callbacks are not being called
to be honest i don't see any mistake with the third and forth functions....with my knowledge, i dont know...i might do something wrong, idk :(

here is the callbacks:
Quote:callback=self.parse_items
Quote:callback=self.parse_ims
and i did try single element into scrapy shell to check if that works....

here is the output into csv file...: SccProducts.csv
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  A problem with installing scrapy Truman 15 795 May-30-2019, 03:52 AM
Last Post: heiner55
  Python - Scrapy Baggelhsk95 0 273 Apr-24-2019, 01:07 PM
Last Post: Baggelhsk95
  Python Scrapy ebay API Baggelhsk95 0 576 Nov-21-2018, 11:22 AM
Last Post: Baggelhsk95
  Python - Scrapy - CSS selector Baggelhsk95 1 675 Nov-07-2018, 04:45 PM
Last Post: stranac
  Trying to install Scrapy but ... ubotbuddy 10 1,297 Nov-07-2018, 01:59 PM
Last Post: Baggelhsk95
  Python - Scrapy - Contains Baggelhsk95 3 624 Oct-27-2018, 03:42 PM
Last Post: stranac
  Python - Scrapy Login in Baggelhsk95 3 926 Oct-23-2018, 04:24 PM
Last Post: stranac
  Python - Scrapy Ebay Test Baggelhsk95 4 666 Oct-16-2018, 12:37 PM
Last Post: snippsat
  Python - Scrapy Login form Baggelhsk95 4 2,206 Oct-16-2018, 08:01 AM
Last Post: Baggelhsk95
  Python - Scrapy Javascript Pagination (next_page) Baggelhsk95 3 1,876 Oct-08-2018, 01:20 PM
Last Post: stranac

Forum Jump:


Users browsing this thread: 1 Guest(s)