Oct-01-2018, 05:06 AM
(Sep-30-2018, 11:33 PM)Truman Wrote: https://www.amazon.com/Web-Scraping-Pyth...1491910291It's not based on urllib,it's used to read source code of web-site.
The whole book is based on urlib library.
Requests is of course now the recommend way to do it.
Page 20,if i rewrite to use Request.
In
bs4
need also to specify a parser to use,recommended is to use lxml as parser. from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("http://www.pythonscraping.com/pages/page3.html") bsObj = BeautifulSoup(html) for child in bsObj.find("table",{"id":"giftList"}).children: print(child)With Requests and lxml as parser.
import requests from bs4 import BeautifulSoup html = requests.get("http://www.pythonscraping.com/pages/page3.html") soup = BeautifulSoup(html.content, 'lxml') for child in soup.find("table",{"id":"giftList"}).children: print(child)Now you know how
urllib.request import urlopen
can be replaced bye Requests,this apply to over 90% of example in book.
The parsing with BeautifulSoup is okay in book.