Python Forum
urlib - to use or not to use ( for web scraping )?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
urlib - to use or not to use ( for web scraping )?
#19
(Sep-30-2018, 11:33 PM)Truman Wrote: https://www.amazon.com/Web-Scraping-Pyth...1491910291
The whole book is based on urlib library.
It's not based on urllib,it's used to read source code of web-site.
Requests is of course now the recommend way to do it.

Page 20,if i rewrite to use Request.
In bs4 need also to specify a parser to use,recommended is to use lxml as parser.
from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html)
for child in bsObj.find("table",{"id":"giftList"}).children:
    print(child)
With Requests and lxml as parser.
import requests
from bs4 import BeautifulSoup

html = requests.get("http://www.pythonscraping.com/pages/page3.html")
soup = BeautifulSoup(html.content, 'lxml')
for child in soup.find("table",{"id":"giftList"}).children:
    print(child)
Now you know how urllib.request import urlopen can be replaced bye Requests,
this apply to over 90% of example in book.
The parsing with BeautifulSoup is okay in book.
Reply


Messages In This Thread
RE: urlib - to use or not to use ( for web scraping )? - by snippsat - Oct-01-2018, 05:06 AM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020