Posts: 11,871
Threads: 474
Joined: Sep 2016
Quote:that's cool but the condition to be met is that we know at first that utils exists...Is there any command such as dir that gives a list of all the methods? It looks that answer is negative.
Well, you don't unless someone tells you or you go digging.
You can start by looking at the documentation of top level packages.
For example if you look at requests by itself, you'll see
Output: PACKAGE CONTENTS
__version__
_internal_utils
adapters
api
auth
certs
compat
cookies
exceptions
help
hooks
models
packages
sessions
status_codes
structures
utils
Each of these has separate documents, and utils is listed here.
As a habit, when I'm not busy, I browse various packages to see what they contain. There's no way
to know what every one of them is and what it contains.
As of this minute, pypi contains 159,959 packages.
Posts: 2,955
Threads: 48
Joined: Sep 2016
(Nov-28-2018, 10:25 PM)Truman Wrote: that's cool but the condition to be met is that we know at first that utils exists...Is there any command such as dir that gives a list of all the methods? It looks that answer is negative.
There is a way. If you use IPython for example as a REPL. Or bpython. Import the requests module, type requests. and hit TAB for autocompletion. You will see it - requests.utils is there.
You can do autocompletion in the Python's REPL too: https://gableroux.com/python/2016/01/20/...ocomplete/
Posts: 404
Threads: 94
Joined: Dec 2017
Larz, where do you get this PACKAGE CONTENTS? I don't see it on pypi.
wavic, things you mentioned are completely new to me. So far I used only Microsoft Azure Notebooks ( with Jupyter ). Should I install it from IPython through Anakonda?
Posts: 11,871
Threads: 474
Joined: Sep 2016
open a python interpreter:
Book $ python
Python 3.7.1 (default, Nov 20 2018, 18:13:14)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>import requests
>>>help(requests) scroll down (or spacebar for next page)
you'll find a list near the top.
There are classes within the package
Posts: 2,955
Threads: 48
Joined: Sep 2016
(Nov-29-2018, 11:10 PM)Truman Wrote: Larz, where do you get this PACKAGE CONTENTS? I don't see it on pypi.
wavic, things you mentioned are completely new to me. So far I used only Microsoft Azure Notebooks ( with Jupyter ). Should I install it from IPython through Anakonda? You can do the same with Jupyter. It is a successor of IPython.
Posts: 404
Threads: 94
Joined: Dec 2017
Dec-10-2018, 11:15 PM
(This post was last modified: Dec-10-2018, 11:15 PM by Truman.)
Any idea what substitute to use with requests for read() and decode() attributes that are a part of urlib?
for ex. in this code:
response = requests.get("http://freegeoip.net/json/"+ipAddress).read().decode('utf-8') I tried to add .utils to this code on several positions but it doesn't work.
Posts: 7,087
Threads: 122
Joined: Sep 2016
Dec-10-2018, 11:51 PM
(This post was last modified: Dec-10-2018, 11:52 PM by snippsat.)
(Dec-10-2018, 11:15 PM)Truman Wrote: Any idea what substitute to use with requests for read() and decode() attributes that are a part of urlib? You do not need to decode with Requests,one of the big advantages is that it get correct encoding from a web-site.
>>> import requests
>>>
>>> r = requests.get('http://python.org')
>>> r.status_code
200
>>> r.encoding
'utf-8' # What encoding this web-site use So print(r.text) get the correct encoding back.
Output: >>> print(r.text)
<!doctype html>
<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->
<!--[if IE 7]> <html class="no-js ie7 lt-ie8 lt-ie9"> <![endif]-->
<!--[if IE 8]> <html class="no-js ie8 lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!--><html class="no-js" lang="en" dir="ltr"> <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<link rel="prefetch" href="//ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js">
<meta name="application-name" content="Python.org">
<meta name="msapplication-tooltip" content="The official home of the Python Programming Language">
<meta name="apple-mobile-web-app-title" content="Python.org">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="HandheldFriendly" content="True">
<meta name="format-detection" content="telephone=no">
<meta http-equiv="cleartype" content="on">
<meta http-equiv="imagetoolbar" content="false">
<script src="/static/js/libs/modernizr.js"></script>
<link href="/static/stylesheets/style.css" rel="stylesheet" type="text/css" title="default" />
<link href="/static/stylesheets/mq.css" rel="stylesheet" type="text/css" media="not print, braille, embossed, speech, tty" />
<!--[if (lte IE 8)&(!IEMobile)]>
<link href="/static/stylesheets/no-mq.css" rel="stylesheet" type="text/css" media="screen" />
<![endif]-->
.........................
Just remember that use content and not text when use a parser eg BS.
Because BS do own encoding to Unicode,so it's not been encoding 2 times.
Example:
from bs4 import BeautifulSoup
import requests
url = 'https://www.python.org/'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'lxml') # See that content i used
print(soup.select('head > title')[0].text) Output: Welcome to Python.org
Posts: 404
Threads: 94
Joined: Dec 2017
Dec-11-2018, 11:49 PM
(This post was last modified: Dec-11-2018, 11:52 PM by Truman.)
Thank you.
Now I'm trying to make a code that downloads images from a page and as you can imagine...it doesn't go that well
import requests
from bs4 import BeautifulSoup
import shutil
html = requests.get("http://www.pythonscraping.com", stream=True)
bsObj = BeautifulSoup(html.content, 'html.parser')
imageLocation = bsObj.find("a", {"id":"logo"}).find("img")["src"]
with open('img.jpg', 'wb') as out_file:
shutil.copyfileobj(imageLocation, out_file) Error: Traceback (most recent call last):
File "C:\Python36\kodovi\crawler3.py", line 9, in <module>
shutil.copyfileobj(imageLocation, out_file)
File "C:\Python36\lib\shutil.py", line 79, in copyfileobj
buf = fsrc.read(length)
AttributeError: 'str' object has no attribute 'read'
this is the urlib code from the book that I'm trying to transform:
from urllib.request import urlretrieve
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com")
bsObj = BeautifulSoup(html)
imageLocation = bsObj.find("a", {"id": "logo"}).find("img")["src"]
urlretrieve (imageLocation, "logo.jpg")
Posts: 11,871
Threads: 474
Joined: Sep 2016
Dec-12-2018, 12:44 AM
(This post was last modified: Dec-12-2018, 12:44 AM by Larz60+.)
Try:
import requests
from bs4 import BeautifulSoup
url = "http://www.pythonscraping.com"
html = requests.get(url, stream=True)
if html.status_code == 200:
bsObj = BeautifulSoup(html.content, 'html.parser')
imageLocation = bsObj.find("a", {"id":"logo"}).find("img")["src"]
image = requests.get(imageLocation)
if image.status_code == 200:
with open('img.jpg', 'wb') as out_file:
out_file.write(image.content)
else:
print(f'Problem fetching image status code: {image.status_code}')
else:
print(f'Problem fetching {url} status code: {html.status_code}') -- Edit Modified 2nd request, should check status code --
Posts: 7,087
Threads: 122
Joined: Sep 2016
Dec-12-2018, 01:37 AM
(This post was last modified: Dec-12-2018, 01:37 AM by snippsat.)
Larz60+ code is correct.
In your first code only need to change line 9 to this,and remove shutil.
out_file.write(requests.get(image_location).content) Sometime is also okay to get original image name.
import requests, os
from bs4 import BeautifulSoup
html = requests.get("http://www.pythonscraping.com")
bs_obj = BeautifulSoup(html.content, 'html.parser')
image_location = bs_obj.find("a", id='logo').find("img")["src"]
image_name = os.path.basename(image_location)
with open(image_name, 'wb') as out_file:
out_file.write(requests.get(image_location).content)
|