Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Scrapy]
#1
This Scrapy tutorial explains why and how you should use a framework like Scrapy for scraping websites.

Websites can contain a lot of information. If you need to extract information from a website, you need a scraper which collects all the data you are interested in. Beware that it is not legal to scrape information from all websites. Some websites disallow the usage of web scrapers.

Why not write a scraper yourself? There are many things that you will struggle with. One simple example is link following. If you write a naive scraper which automatically follows all the links it encounters, I guarantee you that you will end up with the following two issues:
- You are scraping in cycles. Webpage A can refer to webpage B and webpage B can refer to webpage A. Therefore, you need some kind of history to keep track of the webpages you have already visited.
- You will "escape" from your scraping domain. Suppose you are scraping a website like https://www.data-blogger.com/. There will probably exist a link to a webpage which is not from the same domain (for example, a link to Google). When your scraper follows that link, it will scrape the Google webpage! Therefore, you need to keep track of the domains you are scraping.

Here, only two examples are explained. It is definitely worth it to use a scraper like Scrapy. If you like to implement your own scraper, definitely follow this tutorial on implementing a scraper in Python and Scrapy.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Scrapy-cut: Advanced Cookiecutter Scrapy Templating scriptso 2 4,650 Feb-02-2017, 07:57 PM
Last Post: scriptso

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020