Help with urllib.request

Brian177 · (This post was last modified: Apr-19-2021, 05:09 AM by buran.)

Help on this please; Is a second urlopen needed, in the scenario I describe?

import urllib.request

# Given you open a url
resp = urllib.request.urlopen('http://httpbin.org/xml')

# Then execute a read
stuff = resp.read()

# You can print the result
print(stuff)
# For this post pretend you see the result, it works.

# Then if you read again
stuff2 = resp.read()

# You find that nothing results
print(stuff2)
''

# If you do the open again, then you can read again and
# do get the results
resp = urllib.request.urlopen('http://httpbin.org/xml')

stuff2 = resp.read()

# I see the resp object has a seek function(?). But if that can be used
# to reset a pointer, instead of executing a urlopen again, I have not
# figured it out.
# If there is no real world reason to do a second read this way, then it
# is just an academic question. I am a Python newbie.

buran write Apr-19-2021, 05:09 AM:
Please, use proper tags when post code, traceback, output, etc. This time I have added tags for you.
See BBcode help for more info.

***snippsat*** · (This post was last modified: Apr-19-2021, 11:37 AM by snippsat.)

(Apr-19-2021, 01:53 AM)Brian177 Wrote: Help on this please; Is a second urlopen needed, in the scenario I describe?

A advice is not to use urllib.
Requests has taken over all task for many years ago in a better way.
Example.

>>> import requests
>>> 
>>> resp = requests.get('http://httpbin.org/xml')
>>> resp.status_code
200
>>> stuff = resp.text
>>> print(stuff)

Output:<?xml version='1.0' encoding='us-ascii'?>

<!--  A SAMPLE set of slides  -->

<slideshow 
    title="Sample Slide Show"
    date="Date of publication"
    author="Yours Truly"
    >

    <!-- TITLE SLIDE -->
    <slide type="all">
      <title>Wake up to WonderWidgets!</title>
    </slide>

    <!-- OVERVIEW -->
    <slide type="all">
        <title>Overview</title>
        <item>Why <em>WonderWidgets</em> are great</item>
        <item/>
        <item>Who <em>buys</em> WonderWidgets</item>
    </slide>

</slideshow>

As the output is xml so a is common way also to to add Bs4,so can parse result.

import requests
from bs4 import BeautifulSoup

url = 'http://httpbin.org/xml'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
title = soup.select_one('title')
print(title.text)

Output:
Wake up to WonderWidgets!

Brian177 · Apr-21-2021, 01:58 PM

Thank you very much!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	urllib can't find "parse"	rjdegraff42	6	6,557	Jul-24-2023, 05:28 PM Last Post: deanhystad
	how can I correct the Bad Request error on my curl request	tomtom	8	7,260	Oct-03-2021, 06:32 AM Last Post: tomtom
	Prevent urllib.request from using my local proxy	spacedog	0	3,804	Apr-24-2021, 08:55 PM Last Post: spacedog
	urllib.request.ProxyHandler works with bad proxy	spacedog	0	7,291	Apr-24-2021, 08:02 AM Last Post: spacedog
	Need help with XPath using requests,time,urllib.request and BeautifulSoup	spacedog	3	3,787	Apr-24-2021, 02:48 AM Last Post: bowlofred
	urllib.request	ericmt123	2	3,150	Dec-21-2020, 06:53 PM Last Post: Larz60+
	Cannot open url link using urllib.request	Askic	5	8,675	Oct-25-2020, 04:56 PM Last Post: Askic
	urllib is not a package traceback	cc26	3	7,915	Aug-28-2020, 09:34 AM Last Post: snippsat
	ImportError: cannot import name 'Request' from 'request'	abhishek81py	1	5,209	Jun-18-2020, 08:07 AM Last Post: buran
	get file by proxy and header using urllib.request.urlretrieve	randyjack	0	2,917	Mar-12-2020, 09:22 AM Last Post: randyjack

Help with urllib.request

User Panel Messages

Announcements