Python Forum
urlib - to use or not to use ( for web scraping )?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
urlib - to use or not to use ( for web scraping )?
#11
I wouldn't throw it out. Being 3 years old is not what I would call obsolete, although it was before the release of 3.6.0, which contained some major improvements to the language.

There is a review here: https://blogs.ucl.ac.uk/digital-educatio...th-python/
and another here: https://www.reddit.com/r/learnprogrammin...tuff_with/

I haven't read the first page, so can't review myself. However one thing I noticed is that the author pushes his Udemy video course. I wouldn't like to find that in a book I was buying, as I would be suspicious that important concepts might be left to those wanting to pay for the course.

I buy, and read books constantly, but that is almost always to find information on more complex algorithms that I want to study in depth, and use web videos and tutorials (free 90% of the time) to get the basics.

Personally, If I paid for the book, I'd be hard pressed to throw it out, at least until I needed shelf space for something better.

If you don't think you want to keep the book find someone to give it to.

One final note, check out free course-ware offered by schools, MIT courseware: https://ocw.mit.edu/index.htm is a great example.
find subjects here: https://ocw.mit.edu/courses/find-by-topic/
Check out the resources page on this forum here: https://python-forum.io/Thread-A-List-of...-Resources
Reply
#12
i just told someone to keep and use their 10 year old book. The content is still valid and useful. There are just more options since then.
Recommended Tutorials:
Reply
#13
I have a statistics book that I used daily at work back in the 1980's
I'd have some words with anyone throwing that book out!
I also have my original C programming language book by Brian Kernighan and Dennis Ritchie, which is falling apart, but I keep it. And another: C: A Reference Manual 1st edition (also falling apart). And C J Date An Introduction to Database Systems 1st edition from 1970.
Reply
#14
Larz, I'm not sure if we understand each other. I was talking about this book that I have:
https://www.amazon.com/Web-Scraping-Pyth...1491910291
The whole book is based on urlib library.
I just mentioned Automate the boring stuff with Python because I like this book ( use a free version ) and the author says that urlib is to be avoid.
Reply
#15
I just scanned automate the boring stuff and yes, that is how we would instruct to webscrape
Recommended Tutorials:
Reply
#16
Chapter 11 of "Automate..." is about web scraping and it's useful although not all solutions work well.
Reply
#17
I have that book. I was just reading it the other day.
and I just happened to be looking for selenium examples I don't think I found any,
or it would still be on my desk!

Still, it's a good book, keep it you can easily replace the urllib with requests.
Keep your eye on the free books offered by Packt Publishing: https://www.packtpub.com/packt/offers/free-learning
They often offer scraping books as the free book of the day.

When you visit that page, scroll down, there are 30 books offered free, some may be useful, there are several on html, css and web development in general. Scraping is just web development in reverse!
Reply
#18
If you are interested in lxml scraping, here's a good tutorial and it uses requests!
http://stanford.edu/~mgorkove/cgi-bin/rp...h_lxml.php
Reply
#19
(Sep-30-2018, 11:33 PM)Truman Wrote: https://www.amazon.com/Web-Scraping-Pyth...1491910291
The whole book is based on urlib library.
It's not based on urllib,it's used to read source code of web-site.
Requests is of course now the recommend way to do it.

Page 20,if i rewrite to use Request.
In bs4 need also to specify a parser to use,recommended is to use lxml as parser.
from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html)
for child in bsObj.find("table",{"id":"giftList"}).children:
    print(child)
With Requests and lxml as parser.
import requests
from bs4 import BeautifulSoup

html = requests.get("http://www.pythonscraping.com/pages/page3.html")
soup = BeautifulSoup(html.content, 'lxml')
for child in soup.find("table",{"id":"giftList"}).children:
    print(child)
Now you know how urllib.request import urlopen can be replaced bye Requests,
this apply to over 90% of example in book.
The parsing with BeautifulSoup is okay in book.
Reply
#20
Finally started reading it. I think I will use 'html.parser', it looks simple although I read somewhere that it's the slowest parser there is.

By the way, I'm still thinking should I take the path of web scraping or data analysis&design that is a topic that may interest me even more. Doh
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020