Python Forum

Full Version: RFC downloader not working
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am using the following code to download rfc pages. I am however getting
only html and not plain English text like on the webpage http://www.ietf.org/rfc/rfc2324.txt


In the command prompt I type:

python "RFC-Downloader.py" 2324 | more

import sys, urllib.request
try:
    rfc_number = int(sys.argv[1])
except (IndexError, ValueError):
    print('Must supply an RFC number as first argument')
    sys.exit(2)
template = 'http://www.ietf.org/rfc/rfc{}.txt'
url = template.format(rfc_number)
rfc_raw = urllib.request.urlopen(url).read()
rfc = rfc_raw.decode()
print(rfc)
I have a package that does this here: https://github.com/Larz60p/MakerProject
It includes a jupyter notebook that gives step by step instructions on how to use.
It's a little strange that it gives all that html back.
Not gone try to fix it,as you should not use urllib at all.

It work fine with Requests and also need to do a little parsing with BS.
Example:
from bs4 import BeautifulSoup
import requests

url = 'https://www.ietf.org/rfc/rfc2324.txt'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
pre = soup.find('p')
print(pre.text.strip())