Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
RFC downloader not working
#1
I am using the following code to download rfc pages. I am however getting
only html and not plain English text like on the webpage http://www.ietf.org/rfc/rfc2324.txt


In the command prompt I type:

python "RFC-Downloader.py" 2324 | more

import sys, urllib.request
try:
    rfc_number = int(sys.argv[1])
except (IndexError, ValueError):
    print('Must supply an RFC number as first argument')
    sys.exit(2)
template = 'http://www.ietf.org/rfc/rfc{}.txt'
url = template.format(rfc_number)
rfc_raw = urllib.request.urlopen(url).read()
rfc = rfc_raw.decode()
print(rfc)
Reply
#2
I have a package that does this here: https://github.com/Larz60p/MakerProject
It includes a jupyter notebook that gives step by step instructions on how to use.
Reply
#3
It's a little strange that it gives all that html back.
Not gone try to fix it,as you should not use urllib at all.

It work fine with Requests and also need to do a little parsing with BS.
Example:
from bs4 import BeautifulSoup
import requests

url = 'https://www.ietf.org/rfc/rfc2324.txt'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
pre = soup.find('p')
print(pre.text.strip())
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  m3u8 using build-in browser downloader? kucingkembar 3 115 6 hours ago
Last Post: kucingkembar
  Can not make this image downloader work Blue Dog 6 4,005 Jun-23-2020, 08:55 PM
Last Post: snippsat
  Multiple File Downloader Josh_Python890 1 2,454 Sep-16-2017, 11:19 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020