Python Forum
urlib - to use or not to use ( for web scraping )? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: urlib - to use or not to use ( for web scraping )? (/thread-13080.html)

Pages: 1 2 3 4 5


urlib - to use or not to use ( for web scraping )? - Truman - Sep-26-2018

Al Sweigart in Automate the boring stuff with Python strongly suggests to avoid urlib2. I'm asking this because I came to possession of a book called "Web Scraping with Python" by Ryan Mitchell. Her book uses it from the first page. Should I just skip it?

Also, if you have any recommendation on books or video tutorials about Web Scraping I'll be glad to hear it.


RE: urlib - to use or not to use ( for web scraping )? - metulburr - Sep-27-2018

Most people now use the requests module as it does the boilerplate code for you.
https://pypi.org/project/requests/
You can always learn urllib of course regardless. Check out our tutorials for web scraping for requests module
https://python-forum.io/Forum-Web-Scraping


RE: urlib - to use or not to use ( for web scraping )? - Larz60+ - Sep-27-2018

I have used it, but rarely, and can't remember exactly why.
I find that I can pretty much do what I need with selenium and beautiful soup (usually use lxml with soup)
and always requests


RE: urlib - to use or not to use ( for web scraping )? - Axel_Erfurt - Sep-27-2018

I used earlier urlib (urlopen) for downloading files. request was in urlib only in python 3

vers = platform.python_version()
print("Python " + vers)
if vers[0] == "2":
    from urllib import urlopen
else:
    from urllib.request import urlopen



RE: urlib - to use or not to use ( for web scraping )? - Larz60+ - Sep-27-2018

not request ... it's requests a separate and wonderful package


RE: urlib - to use or not to use ( for web scraping )? - metulburr - Sep-27-2018

urllib.request is different than requests. urllib.request is in the standard library, whereas requests is a 3rd party library for python that has to be installed normally through pip pip install requests. There are a lot of 3rd party libs that most people install alongside python. bs4 (BeautifulSoup) and selenium usually go hand in hand with requests for javascript bypassing and scraping.


RE: urlib - to use or not to use ( for web scraping )? - Truman - Sep-27-2018

In other words maybe reading that book is not the best idea. I'm familiar with forum tutorials, I'll check for video tutorials and books myself. Currently I study documentation regarding web scraping ( requests, BeautifulSoup, css selectors; selenium about just to start ). Maybe I should also look for some finished project on github to see how it is to be done...

p.s. do you more often use .find() or css selectors? Is there any important difference?


RE: urlib - to use or not to use ( for web scraping )? - wavic - Sep-30-2018

Keep reading the book. What you will use to get the webpage is insignificant in most of the cases. I am using requests and Selenium most of the times but I used to use urllib in my earlier python scripts and it works just fine. I didn't know about requests. Another reason why you may want to use the built-in module is if you do not have permissions to install anything on a machine so you must use what you have installed already. You have to know at least the basics of it.


RE: urlib - to use or not to use ( for web scraping )? - Larz60+ - Sep-30-2018

Quote:In other words maybe reading that book is not the best idea.
I wouldn't judge the book on this, unless very recent and the author is adamant about the urllib thing, you can replace that small portion of code. If the book is older, urllib was the common page fetcher prior to requests, so totally understandable that it would and should have been suggested.

It still works, and can still be used, it's just that there are many more methods available with requests.


RE: urlib - to use or not to use ( for web scraping )? - Truman - Sep-30-2018

The book is from 2015.