Python Forum

Full Version: Pass multiple items from one parse to another using Scrapy
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Dear Members,

I want to pass both name and link_only from parse to parse_country. I know we use the meta function to transfer. Could you please tell me how can I change the meta function below to transfer both items? With the following code, I get blank cells in the CSV file for the link_only column.

In my actual project, I have 7/8 items to transfer from one parse to another. Please suggest me how to make it

import scrapy
import logging
 
 
class CountriesSpider(scrapy.Spider):
    name = 'countries'
    allowed_domains = ['www.worldometers.info']
    start_urls = ['https://www.worldometers.info/world-population/population-by-country/']
 
    def parse(self, response):
        countries=response.xpath("//td/a")
        for country in countries:
            name=country.xpath(".//text()").get()
            link_only=country.xpath(".//@href/text()").get()
            link=country.xpath(".//@href").get()
 
            yield response.follow(url=link, callback=self.parse_country, meta={'country_name': name, 'link_only':link_only})
            
 
    def parse_country(self, response):
        name=response.request.meta['country_name']
        link_only=response.request.meta['link_only']
        rows = response.xpath("(//table[@class='table table-striped table-bordered table-hover table-condensed table-list'])[1]/tbody/tr")
        for row in rows:
            year=row.xpath(".//td[1]/text()").get()
            population=row.xpath(".//td[2]/strong/text()").get()
 
            yield{
                'country_name': name,
                'link_only': link_only,
                'year': year,
                'population': population
            }
Hi,
The Xref you used for the link_only returns None, therefore the entry of the CSV gets blank.
Just change this:
link_only = country.xpath(".//@href/text()").get()
into this:
link_only = country.xpath(".//@href").get()
The object you are getting as response only has the href tag but no text tag and therefore returns None.
Thank you for your help. It worked.