Jun-07-2024, 05:51 PM
(This post was last modified: Jun-07-2024, 05:51 PM by deanhystad.)
(Jun-06-2024, 06:59 PM)zunebuggy Wrote: My sites.txt is just a list of urls like:If that is what your url's look like, not only is https: a problem. but there are problems with this:
https://abc.com
https://def.com
https://ghi.com
Thanks.
myfold = myurl[8:10] myfn = myurl[8:12]"https://abc.com"[8:10] == "ab", not "abc". 10-8 = 2 characters, not 3. You would need [8:11], or [7:10] after you fix the https issue. Better yet you should use pattern matching to extract the file and folder names.
import re URL = r"http://gibberish.com\more_gibberish?page=" if match := re.search(r"(\w+\.\w+)\\(.*)", URL): website, document = match.groups() name, ext = website.split(".") print(name, ext, document)
Output:gibberish com more_gibberish?page=