Downloading Page Source From URL List

Thread Rating:

0 Vote(s) - 0 Average
1
2
3
4
5

Thread Modes

Downloading Page Source From URL List

deanhystad

Super Moderators

Posts: 6,388

Threads: 17

Joined: Feb 2020

Reputation: 315

#11

Jun-07-2024, 05:51 PM (This post was last modified: Jun-07-2024, 05:51 PM by deanhystad.)

(Jun-06-2024, 06:59 PM)zunebuggy Wrote: My sites.txt is just a list of urls like:
https://abc.com
https://def.com
https://ghi.com
Thanks.

If that is what your url's look like, not only is https: a problem. but there are problems with this:

        myfold = myurl[8:10]
        myfn = myurl[8:12]

"https://abc.com"[8:10] == "ab", not "abc". 10-8 = 2 characters, not 3. You would need [8:11], or [7:10] after you fix the https issue. Better yet you should use pattern matching to extract the file and folder names.

import re

URL = r"http://gibberish.com\more_gibberish?page="

if match := re.search(r"(\w+\.\w+)\\(.*)", URL):
    website, document = match.groups()
    name, ext = website.split(".")
    print(name, ext, document)

Output:
gibberish com more_gibberish?page=

Find

Messages In This Thread

Downloading Page Source From URL List - by zunebuggy - Jun-05-2024, 06:51 PM

RE: Downloading Page Source From URL List - by sawtooth500 - Jun-05-2024, 10:28 PM

RE: Downloading Page Source From URL List - by zunebuggy - Jun-06-2024, 03:17 AM

RE: Downloading Page Source From URL List - by zunebuggy - Jun-06-2024, 03:19 AM

RE: Downloading Page Source From URL List - by zunebuggy - Jun-06-2024, 12:35 PM

RE: Downloading Page Source From URL List - by deanhystad - Jun-06-2024, 01:30 PM

RE: Downloading Page Source From URL List - by zunebuggy - Jun-06-2024, 03:29 PM

RE: Downloading Page Source From URL List - by zunebuggy - Jun-06-2024, 06:59 PM

RE: Downloading Page Source From URL List - by deanhystad - Jun-07-2024, 05:51 PM

RE: Downloading Page Source From URL List - by snippsat - Jun-06-2024, 07:24 PM

RE: Downloading Page Source From URL List - by Pedroski55 - Jun-07-2024, 05:33 AM

RE: Downloading Page Source From URL List - by Pedroski55 - Jun-08-2024, 06:40 AM

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Help with to check an Input list data with a data read from an external source	sacharyya	3	675	Mar-09-2024, 12:33 PM Last Post: Pedroski55
	Downloading images from webpages	H84Gabor	2	2,053	Sep-29-2021, 05:39 PM Last Post: snippsat
	Downloading a module Xlsxwriter	dan789	6	11,655	Jan-26-2019, 02:13 PM Last Post: dan789
	"if statement" and downloading a dataset	Alberto	1	2,608	Jan-25-2018, 01:44 PM Last Post: ka06059
	Downloading and using pyperclip	PMPythonlearner	2	5,218	Dec-31-2017, 04:37 PM Last Post: PMPythonlearner
	Problem downloading 2.7.8 Mac OSX	Benjipincus	2	3,186	Dec-18-2017, 01:33 PM Last Post: snippsat

Users browsing this thread: 1 Guest(s)

View a Printable Version

Downloading Page Source From URL List

User Panel Messages

Announcements