[Scrapy] web scrape help - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: [Scrapy] web scrape help (/thread-21446.html) |
[Scrapy] web scrape help - joe_momma - Sep-30-2019 Hello, downloaded scrapy and went through the tutorials and still trying to understand the selector aspect of scraping. So I thought scrape a different quotes web page: website I created a new project and spider: # -*- coding: utf-8 -*- import scrapy class InspiderSpider(scrapy.Spider): name = 'inspider' allowed_domains = ['https://www.keepinspiring.me/famous-quotes/'] start_urls = ['https://www.keepinspiring.me/famous-quotes//'] def parse(self, response): for quotes in response.css('div.author-quotes'): yield { 'text': quotes.css('span.text::text').extract_first(), 'author': quotes.css('span.quote-author-name::text').extract_first() }I can extract the authors but no luck on the quote. output: when I examine the quote element and copy xpath I get: any help appreciated,Joe RE: [Scrapy] web scrape help - stranac - Sep-30-2019 span.text::text tries selecting the text of a span with class text .Such an element doesn't exist, the text is placed directly in the div. A css selector that would work here would be simply ::text .This technically selects all the text nodes inside the div (including the author), but .extract_first() will give you only the thing you are after.An alternative is using an xpath such as ./text() .A couple of non-selector-related notes:
RE: [Scrapy] web scrape help - joe_momma - Oct-01-2019 thanks, I got quotes and authors. Joe |