Aug-27-2018, 04:16 PM
I am trying to learn how to download txt files from the web. I am familiar with downloading pdfs but when I've tried text files I haven't had that much luck.
I'm not in a class but I am trying to learn this which is why I posted my question here.
This is the code I'm trying to run.
Any help you can provide will be most appreciated!
I'm not in a class but I am trying to learn this which is why I posted my question here.
This is the code I'm trying to run.
from __future__ import print_function import requests from bs4 import BeautifulSoup def file_links_filter(tag): """ Tags filter. Return True for links that ends with 'pdf', 'htm' or 'txt' """ if isinstance(tag, str): return tag.endswith('pdf') or tag.endswith('htm') or tag.endswith('txt') def get_links(tags_list): return [WEB_ROOT + tag.attrs['href'] for tag in tags_list] def download_file(file_link, folder): file = requests.get(file_link).content name = file_link.split('/')[-1] save_path = folder + name print("Saving file:", save_path) with open(save_path, 'wb') as fp: fp.write(file) WEB_ROOT = 'https://www.sec.gov' SAVE_FOLDER = '~/download_files/' # directory in which files will be downloaded r = requests.get("https://www.sec.gov/litigation/suspensions.shtml") soup = BeautifulSoup(r.content, 'html.parser') years = soup.select("p#archive-links > a") # css selector for all <a> inside <p id='archive'> tag years_links = get_links(years) links_to_download = [] for year_link in years_links: page = requests.get(year_link) beautiful_page = BeautifulSoup(page.content, 'html.parser') links = beautiful_page.find_all("a", href=file_links_filter) links = get_links(links) links_to_download.extend(links) # make set to exclude duplicate links links_to_download = set(links_to_download) print("Got links:", links_to_download) for link in set(links_to_download): download_file(link, SAVE_FOLDER)This is the error I receive.
Error:===================== RESTART: C:/Python365/SEC Test.py =====================
Traceback (most recent call last):
File "C:/Python365/SEC Test.py", line 3, in <module>
import requests
ModuleNotFoundError: No module named 'requests'
>>>
I installed requests using pip install. I've tried uninstalling it and then reinstalling it. No luck. Can you point me in another direction?Any help you can provide will be most appreciated!