Python Forum
RFC downloader not working - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: RFC downloader not working (/thread-14832.html)



RFC downloader not working - sidsr003 - Dec-19-2018

I am using the following code to download rfc pages. I am however getting
only html and not plain English text like on the webpage http://www.ietf.org/rfc/rfc2324.txt


In the command prompt I type:

python "RFC-Downloader.py" 2324 | more

import sys, urllib.request
try:
    rfc_number = int(sys.argv[1])
except (IndexError, ValueError):
    print('Must supply an RFC number as first argument')
    sys.exit(2)
template = 'http://www.ietf.org/rfc/rfc{}.txt'
url = template.format(rfc_number)
rfc_raw = urllib.request.urlopen(url).read()
rfc = rfc_raw.decode()
print(rfc)



RE: RFC downloader not working - Larz60+ - Dec-19-2018

I have a package that does this here: https://github.com/Larz60p/MakerProject
It includes a jupyter notebook that gives step by step instructions on how to use.


RE: RFC downloader not working - snippsat - Dec-19-2018

It's a little strange that it gives all that html back.
Not gone try to fix it,as you should not use urllib at all.

It work fine with Requests and also need to do a little parsing with BS.
Example:
from bs4 import BeautifulSoup
import requests

url = 'https://www.ietf.org/rfc/rfc2324.txt'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
pre = soup.find('p')
print(pre.text.strip())