Python Forum

Full Version: Good book on Web scraping and crawling
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi All,

Could you suggest me a good ,standard, latest version book on Python web scraping and crawling

Thanks,
Surya
You dont really need a book. You can use free online sources.

we have a tutorial
https://python-forum.io/Thread-Web-Scraping-part-1

I would say the standard practice nowadays is to use the requests module with BeautifulSoup.
as well as some basic knowledge of HTML so you know what you are parsing/crawling. Instead of BeautifulSoup you could rather use lxml. A little bit of Javascript wouldnt hurt as well as using selenium to bypass site with Javascript.

Most of your googling would most likely reside of how to catch X tag with BeautifulSoup.

NOTE:i havent read this book. I only scanned through it real quick, but it describes BeautifulSoup/Scrapy/API's/Selenium/Xpaths/Image Processing/Text Recognition/bot traps/etc.. The only thing it does is use urllib instead of the requests library. But you can find the same info online if you search for the stuff.  
http://shop.oreilly.com/product/0636920034391.do

EDIT:
https://nocodewebscraping.com/top-10-web...ing-books/
There is also a second part to snippsat's tutorial https://python-forum.io/Thread-Web-scraping-part-2