Python Forum

Full Version: No data when using scrapy to get data
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi all, i have managed to write my first scrapy code but when i run it i get no data from the site, i get no errors but i feel i know the issue, i need to load the page before i run the code, but i am not sure how to do this

here is my code

# -*- coding: utf-8 -*-
import scrapy
from ..items import SydneycheckItem


class SydneyflightcheckSpider(scrapy.Spider):
    name = 'sydneyfc'

    start_urls = [
        'https://www.sydneyairport.com.au/flights/?query=&flightType=departure&terminalType=domestic&date=2019-11-10&sortColumn=scheduled_time&ascending=true&showAll=true'
    ]

    def parse(self, response):
        items = SydneycheckItem()
        destinationname = response.css('.destination-name::text').extract()
        airlinename = response.css('.with-image').css('::text').extract()
  #      airlinelogo = response.css('.img:attr(src)').extract()
        flightnumber = response.css('.flight-numbers').css('::text').extract()
        scheduled = response.css('.large-scheduled-time').css('::text').extract()
        estimated = response.css('.estimated-time').css('::text').extract()
        status = response.css('.status-container').css('::text').extract()

        items['destination_name '] = destinationname
        items['airlinename'] = airlinename
      #  items['airlinelogo'] = airlinelogo
        items['flightnumber'] = flightnumber
        items['scheduled'] = scheduled
        items['estimated'] = estimated
        items['status'] = status

        yield items


        pass
Looks like the data is loaded from an API using javascript.
The easiest way to get it yourself would be requesting it from the API directly, which will give you the data as json.

The API url is https://www.sydneyairport.com.au/_a/flights<query_parameters>
(Nov-11-2019, 06:57 AM)stranac Wrote: [ -> ]Looks like the data is loaded from an API using javascript. The easiest way to get it yourself would be requesting it from the API directly, which will give you the data as json. The API url is https://www.sydneyairport.com.au/_a/flights


Ok great so how would i change my code to do this , still really new at this
First change your start url to fetch the data from the API (just add the _a part to your url).
Then use the json module to load the data from response.text.

At this point you'll have a dict and you can just choose what parts you want to keep.