Python Forum
Downloading Page Source From URL List
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Downloading Page Source From URL List
#11
(Jun-06-2024, 06:59 PM)zunebuggy Wrote: My sites.txt is just a list of urls like:
https://abc.com
https://def.com
https://ghi.com
Thanks.
If that is what your url's look like, not only is https: a problem. but there are problems with this:
        myfold = myurl[8:10]
        myfn = myurl[8:12]
"https://abc.com"[8:10] == "ab", not "abc". 10-8 = 2 characters, not 3. You would need [8:11], or [7:10] after you fix the https issue. Better yet you should use pattern matching to extract the file and folder names.
import re

URL = r"http://gibberish.com\more_gibberish?page="

if match := re.search(r"(\w+\.\w+)\\(.*)", URL):
    website, document = match.groups()
    name, ext = website.split(".")
    print(name, ext, document)
Output:
gibberish com more_gibberish?page=
Reply


Messages In This Thread
RE: Downloading Page Source From URL List - by deanhystad - Jun-07-2024, 05:51 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Help with to check an Input list data with a data read from an external source sacharyya 3 675 Mar-09-2024, 12:33 PM
Last Post: Pedroski55
  Downloading images from webpages H84Gabor 2 2,053 Sep-29-2021, 05:39 PM
Last Post: snippsat
  Downloading a module Xlsxwriter dan789 6 11,655 Jan-26-2019, 02:13 PM
Last Post: dan789
  "if statement" and downloading a dataset Alberto 1 2,608 Jan-25-2018, 01:44 PM
Last Post: ka06059
  Downloading and using pyperclip PMPythonlearner 2 5,218 Dec-31-2017, 04:37 PM
Last Post: PMPythonlearner
  Problem downloading 2.7.8 Mac OSX Benjipincus 2 3,186 Dec-18-2017, 01:33 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020