Python Forum

Full Version: Python - Scrapy - CSS selector
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello everyone!, i was messing with the scrapy i did some examples....but my css selector in Car_Manufacturer, Manufacturer_Model, Model_Edition im getting empty brackets for some reason ...



here is a quick test:
# -*- coding: utf-8 -*-
import scrapy

class Mybot4Spider(scrapy.Spider):
    name = 'MyBot4'
    start_urls = ['https://www.mytoutou.gr/manufacturers/ford/344/1480/']

    def parse(self, response):
        for content in response.css('div.mtt-uil-clbc'):
            form = response.css('div.FormContainer')
            yield {
            'title' : content.css('a::text').extract(),
            'Link' : content.css('a::attr(href)').extract(),
            'H1' : response.css('div.mtt-uil-category-products > h1::text').extract(),
            'Car_Manufacturer' : form.css('span.ui-selectmenu-text').extract(),
            'Manufacturer_Model' : form.css('span.ui-selectmenu-text').extract(),
            'Model_Edition' : form.css('span.ui-selectmenu-text').extract(),
            'CurrentURL' : response.url
            }
p.s i saw the form is work with java script to show the current model....so im thinking to split the url and get the value for each url

here is the quick css:
'Manufacturer_Model' : response.css('option[value="3444"]::text').extract()
im having over 20k links to crawl...its not the only one to craw... so i was thinking if i can split them to get the value...

or if you have smarter idea to read the javascript that will be great!!! :D
It looks like the classes you're trying to use are created by javascript, but the data itself is available in the source.
One possibility you have is finding the selected option, e.g.:
'Car_Manufacturer': form.css('#car-manuf option[selected]::text').get(),