Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Downloading txt files
#1
I am trying to learn how to download txt files from the web. I am familiar with downloading pdfs but when I've tried text files I haven't had that much luck.

I'm not in a class but I am trying to learn this which is why I posted my question here.

This is the code I'm trying to run.

from __future__ import print_function

import requests
from bs4 import BeautifulSoup


def file_links_filter(tag):
    """
    Tags filter. Return True for links that ends with 'pdf', 'htm' or 'txt'
    """
    if isinstance(tag, str):
        return tag.endswith('pdf') or tag.endswith('htm') or tag.endswith('txt')


def get_links(tags_list):
    return [WEB_ROOT + tag.attrs['href'] for tag in tags_list]


def download_file(file_link, folder):
    file = requests.get(file_link).content
    name = file_link.split('/')[-1]
    save_path = folder + name

    print("Saving file:", save_path)
    with open(save_path, 'wb') as fp:
        fp.write(file)


WEB_ROOT = 'https://www.sec.gov'
SAVE_FOLDER = '~/download_files/'  # directory in which files will be downloaded

r = requests.get("https://www.sec.gov/litigation/suspensions.shtml")

soup = BeautifulSoup(r.content, 'html.parser')

years = soup.select("p#archive-links > a")  # css selector for all <a> inside <p id='archive'> tag
years_links = get_links(years)

links_to_download = []
for year_link in years_links:
    page = requests.get(year_link)
    beautiful_page = BeautifulSoup(page.content, 'html.parser')

    links = beautiful_page.find_all("a", href=file_links_filter)
    links = get_links(links)

    links_to_download.extend(links)

# make set to exclude duplicate links
links_to_download = set(links_to_download)

print("Got links:", links_to_download)

for link in set(links_to_download):
    download_file(link, SAVE_FOLDER)
This is the error I receive.

Error:
===================== RESTART: C:/Python365/SEC Test.py ===================== Traceback (most recent call last): File "C:/Python365/SEC Test.py", line 3, in <module> import requests ModuleNotFoundError: No module named 'requests' >>>
I installed requests using pip install. I've tried uninstalling it and then reinstalling it. No luck. Can you point me in another direction?

Any help you can provide will be most appreciated!
Reply


Messages In This Thread
Downloading txt files - by tjnichols - Aug-27-2018, 04:16 PM
RE: Downloading txt files - by DeaD_EyE - Aug-27-2018, 04:38 PM
RE: Downloading txt files - by buran - Aug-27-2018, 04:51 PM
RE: Downloading txt files - by Gribouillis - Aug-27-2018, 05:01 PM
RE: Downloading txt files - by tjnichols - Aug-27-2018, 05:36 PM
RE: Downloading txt files - by buran - Aug-27-2018, 06:03 PM
RE: Downloading txt files - by tjnichols - Aug-27-2018, 10:01 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  python selenium downloading embedded pdf damian0612 0 3,828 Feb-23-2021, 09:11 PM
Last Post: damian0612
  Downloading CSV from a website bmiller12 1 1,872 Nov-26-2020, 09:33 AM
Last Post: Axel_Erfurt
  Downloading book preview Truman 6 3,626 May-15-2019, 10:02 PM
Last Post: Truman
  Downloading Multiple Webpages MoziakBeats 4 3,377 Apr-17-2019, 04:06 AM
Last Post: Skaperen

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020