(Dec-10-2018, 11:15 PM)Truman Wrote: Any idea what substitute to use with requests for read() and decode() attributes that are a part of urlib?You do not need to decode with Requests,one of the big advantages is that it get correct encoding from a web-site.
>>> import requests >>> >>> r = requests.get('http://python.org') >>> r.status_code 200 >>> r.encoding 'utf-8' # What encoding this web-site useSo
print(r.text)
get the correct encoding back.Output:>>> print(r.text)
<!doctype html>
<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->
<!--[if IE 7]> <html class="no-js ie7 lt-ie8 lt-ie9"> <![endif]-->
<!--[if IE 8]> <html class="no-js ie8 lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!--><html class="no-js" lang="en" dir="ltr"> <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<link rel="prefetch" href="//ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js">
<meta name="application-name" content="Python.org">
<meta name="msapplication-tooltip" content="The official home of the Python Programming Language">
<meta name="apple-mobile-web-app-title" content="Python.org">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="HandheldFriendly" content="True">
<meta name="format-detection" content="telephone=no">
<meta http-equiv="cleartype" content="on">
<meta http-equiv="imagetoolbar" content="false">
<script src="/static/js/libs/modernizr.js"></script>
<link href="/static/stylesheets/style.css" rel="stylesheet" type="text/css" title="default" />
<link href="/static/stylesheets/mq.css" rel="stylesheet" type="text/css" media="not print, braille, embossed, speech, tty" />
<!--[if (lte IE 8)&(!IEMobile)]>
<link href="/static/stylesheets/no-mq.css" rel="stylesheet" type="text/css" media="screen" />
<![endif]-->
.........................
Just remember that use content
and not text
when use a parser eg BS.Because BS do own encoding to Unicode,so it's not been encoding 2 times.
Example:
from bs4 import BeautifulSoup import requests url = 'https://www.python.org/' url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'lxml') # See that content i used print(soup.select('head > title')[0].text)
Output:Welcome to Python.org