Python Forum
Downloading Page Source From URL List
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Downloading Page Source From URL List
#1
from urllib.request import urlopen
from pathlib import Path

with open(r'D:\Desktop\sites.txt', 'r', encoding='UTF-8') as file:
    while line := file.readline():
        myurl = line.rstrip()
        myfold = myurl[8:10]
        myfn = myurl[8:12]
        myfilen = myfn + 'txt'
        Path("D:/Desktop/" + myfold).mkdir(parents=True, exist_ok=True)
        with urlopen( myurl ) as webpage:
            content = webpage.read().decode()
            with open("D:/Desktop/" + myfold + "/" + myfilen , "w" ) as output:
                output.write( content )
I have a list of urls named sites.txt

Each URL has the same length.

I am trying to loop through each site and
1. Take part of the url name and create a folder on the Desktop if it doesn't exist.
2. Take part of the url and create a filename.
3. Save the page source of each url as a text file within the corresponding folder

This seems to run in Python 3 without any errors, but if it's creating the txt files, they are somewhere else. It is not even creating the folders.

Sorry I am new to Python.

Thank you.
Reply


Messages In This Thread
Downloading Page Source From URL List - by zunebuggy - Jun-05-2024, 06:51 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Help with to check an Input list data with a data read from an external source sacharyya 3 675 Mar-09-2024, 12:33 PM
Last Post: Pedroski55
  Downloading images from webpages H84Gabor 2 2,053 Sep-29-2021, 05:39 PM
Last Post: snippsat
  Downloading a module Xlsxwriter dan789 6 11,655 Jan-26-2019, 02:13 PM
Last Post: dan789
  "if statement" and downloading a dataset Alberto 1 2,608 Jan-25-2018, 01:44 PM
Last Post: ka06059
  Downloading and using pyperclip PMPythonlearner 2 5,218 Dec-31-2017, 04:37 PM
Last Post: PMPythonlearner
  Problem downloading 2.7.8 Mac OSX Benjipincus 2 3,186 Dec-18-2017, 01:33 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020