![]() |
RFC downloader not working - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: RFC downloader not working (/thread-14832.html) |
RFC downloader not working - sidsr003 - Dec-19-2018 I am using the following code to download rfc pages. I am however getting only html and not plain English text like on the webpage http://www.ietf.org/rfc/rfc2324.txt In the command prompt I type: python "RFC-Downloader.py" 2324 | more import sys, urllib.request try: rfc_number = int(sys.argv[1]) except (IndexError, ValueError): print('Must supply an RFC number as first argument') sys.exit(2) template = 'http://www.ietf.org/rfc/rfc{}.txt' url = template.format(rfc_number) rfc_raw = urllib.request.urlopen(url).read() rfc = rfc_raw.decode() print(rfc) RE: RFC downloader not working - Larz60+ - Dec-19-2018 I have a package that does this here: https://github.com/Larz60p/MakerProject It includes a jupyter notebook that gives step by step instructions on how to use. RE: RFC downloader not working - snippsat - Dec-19-2018 It's a little strange that it gives all that html back. Not gone try to fix it,as you should not use urllib at all. It work fine with Requests and also need to do a little parsing with BS. Example: from bs4 import BeautifulSoup import requests url = 'https://www.ietf.org/rfc/rfc2324.txt' response = requests.get(url) soup = BeautifulSoup(response.content, 'lxml') pre = soup.find('p') print(pre.text.strip()) |