Downloading txt files

tjnichols · Aug-27-2018, 04:16 PM

I am trying to learn how to download txt files from the web. I am familiar with downloading pdfs but when I've tried text files I haven't had that much luck.

I'm not in a class but I am trying to learn this which is why I posted my question here.

This is the code I'm trying to run.

from __future__ import print_function

import requests
from bs4 import BeautifulSoup


def file_links_filter(tag):
    """
    Tags filter. Return True for links that ends with 'pdf', 'htm' or 'txt'
    """
    if isinstance(tag, str):
        return tag.endswith('pdf') or tag.endswith('htm') or tag.endswith('txt')


def get_links(tags_list):
    return [WEB_ROOT + tag.attrs['href'] for tag in tags_list]


def download_file(file_link, folder):
    file = requests.get(file_link).content
    name = file_link.split('/')[-1]
    save_path = folder + name

    print("Saving file:", save_path)
    with open(save_path, 'wb') as fp:
        fp.write(file)


WEB_ROOT = 'https://www.sec.gov'
SAVE_FOLDER = '~/download_files/'  # directory in which files will be downloaded

r = requests.get("https://www.sec.gov/litigation/suspensions.shtml")

soup = BeautifulSoup(r.content, 'html.parser')

years = soup.select("p#archive-links > a")  # css selector for all <a> inside <p id='archive'> tag
years_links = get_links(years)

links_to_download = []
for year_link in years_links:
    page = requests.get(year_link)
    beautiful_page = BeautifulSoup(page.content, 'html.parser')

    links = beautiful_page.find_all("a", href=file_links_filter)
    links = get_links(links)

    links_to_download.extend(links)

# make set to exclude duplicate links
links_to_download = set(links_to_download)

print("Got links:", links_to_download)

for link in set(links_to_download):
    download_file(link, SAVE_FOLDER)

This is the error I receive.

Error:===================== RESTART: C:/Python365/SEC Test.py =====================
Traceback (most recent call last):
  File "C:/Python365/SEC Test.py", line 3, in <module>
    import requests
ModuleNotFoundError: No module named 'requests'
>>>

I installed requests using pip install. I've tried uninstalling it and then reinstalling it. No luck. Can you point me in another direction?

Any help you can provide will be most appreciated!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	python selenium downloading embedded pdf	damian0612	0	3,828	Feb-23-2021, 09:11 PM Last Post: damian0612
	Downloading CSV from a website	bmiller12	1	1,872	Nov-26-2020, 09:33 AM Last Post: Axel_Erfurt
	Downloading book preview	Truman	6	3,626	May-15-2019, 10:02 PM Last Post: Truman
	Downloading Multiple Webpages	MoziakBeats	4	3,377	Apr-17-2019, 04:06 AM Last Post: Skaperen

Downloading txt files

User Panel Messages

Announcements