Python Forum

Full Version: undefined function error
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
from scrapy import Spider
from scrapy.http import Request


class TesterSpider(Spider):
    name = 'tester'
    allowed_domains = ['books.toscrape.com']
    start_urls = ['http://books.toscrape.com/']


    def parse(self, response):
        books = response.xpath("//h3/a/@href").extract()
        for book in books:
            absolute_url = response.urljoin(book)
            yield Request(absolute_url, callback=self.parse_book)

        # process next page
        next_page_url = response.xpath("//a[text()='next']/@href").extract_first()
        absolute_next_page_url = response.urljoin(next_page_url)
        yield Request(absolute_next_page_url)


    def parse_book(self, response):
        title = response.xpath("//h1/text()").extract_first()
        price = response.xpath("//*[@class='price_color']/text()").extract_first()
        img_url = response.xpath("//img/@src").extract_first()
        img_url = img_url.replace('../..', 'https://books.toscrape.com')
        rating = response.xpath("//p[starts-with(@class,'star-rating')]/@class").extract_first()
        rating = rating.replace('star-rating ', '')
        desc = response.xpath("//div[(@id='product_description')]/following-sibling::p/text()").extract_first()
        
        # Product Description
        upc = product_desc(response, 'UPC')
        product_type = product_desc(response, 'Product Type')
        availability = product_desc(response, 'Availability')
        number_of_reviews = product_desc(response, 'Number of reviews')

        yield{
            'Title': title,
            'Price': price,
            'Location': img_url,
            'Rating': rating,
            'Description': desc,
            'UPC': upc,
            'Product Type': product_type,
            'Availability': availability,
            'Reviews': number_of_reviews
        }


    def product_desc(response, lookup):
        return response.xpath("//th[text()='" + lookup + "']/following-sibling::td/text()").extract_first()
As you can see, at the very bottom, the function 'product_desc' is defined, but just above that where I called it just above the yield block, my IDE, VS Code reports that it is undefined. Can anyone spot what I am missing.

Thank you
Dedent product_desc so it becomes a function not a method of the class.
(Sep-06-2022, 07:00 AM)Yoriz Wrote: [ -> ]Dedent product_desc so it becomes a function not a method of the class.


Right on the money. Thank you.
(Sep-06-2022, 07:00 AM)Yoriz Wrote: [ -> ]Dedent product_desc so it becomes a function not a method of the class.

Is there a switch that I can pass to 'scrapy crawl SpiderName -o ExportFilePath' that will force an overwrite if the file previously existed?
(Sep-06-2022, 12:25 PM)JonWayn Wrote: [ -> ]I there a switch that I can pass to 'scrapy crawl SpiderName -o ExportFilePath' that will force an overwrite if the file previously existed?
G:\div_code\scrapy_stuff
λ scrapy commands --help
Usage
=====
  scrapy commands

Options
=======
  -h, --help            show this help message and exit
  -a NAME=VALUE         set spider argument (may be repeated)
  -o FILE, --output FILE
                        append scraped items to the end of FILE (use - for stdout)
  -O FILE, --overwrite-output FILE
                        dump scraped items into FILE, overwriting any existing file
  -t FORMAT, --output-format FORMAT
                        format to use for dumping items

Global Options
--------------
  --logfile FILE        log file. if omitted stderr will be used
  -L LEVEL, --loglevel LEVEL
                        log level (default: DEBUG)
  --nolog               disable logging completely
  --profile FILE        write python cProfile stats to FILE
  --pidfile FILE        write process ID to FILE
  -s NAME=VALUE, --set NAME=VALUE
                        set/override setting (may be repeated)
  --pdb                 enable pdb on failure
So instead of -o pass inn -O.
scrapy crawl SpiderName -O ExportFilePath
(Sep-06-2022, 01:59 PM)snippsat Wrote: [ -> ]
(Sep-06-2022, 12:25 PM)JonWayn Wrote: [ -> ]I there a switch that I can pass to 'scrapy crawl SpiderName -o ExportFilePath' that will force an overwrite if the file previously existed?
G:\div_code\scrapy_stuff
λ scrapy commands --help
Usage
=====
  scrapy commands

Options
=======
  -h, --help            show this help message and exit
  -a NAME=VALUE         set spider argument (may be repeated)
  -o FILE, --output FILE
                        append scraped items to the end of FILE (use - for stdout)
  -O FILE, --overwrite-output FILE
                        dump scraped items into FILE, overwriting any existing file
  -t FORMAT, --output-format FORMAT
                        format to use for dumping items

Global Options
--------------
  --logfile FILE        log file. if omitted stderr will be used
  -L LEVEL, --loglevel LEVEL
                        log level (default: DEBUG)
  --nolog               disable logging completely
  --profile FILE        write python cProfile stats to FILE
  --pidfile FILE        write process ID to FILE
  -s NAME=VALUE, --set NAME=VALUE
                        set/override setting (may be repeated)
  --pdb                 enable pdb on failure
So instead of -o pass inn -O.
scrapy crawl SpiderName -O ExportFilePath

Thank you