Get the filename from a path

DeaD_EyE · (This post was last modified: Jul-12-2020, 11:41 AM by DeaD_EyE.)

Pathlib does not handle urls correct. You could deconstruct an url with urllib.parse.urlparse and construct an url with urllib.parse.urlunparse. Pathlib could handle the ParseResult.path from urlparse().

Helper functions without pathlib and urllib:

def get_file(url):
    return url.rpartition("/")[2]

def is_pdf(file):
    return file.rpartition(".")[2] == "pdf"


paths = ('http://www.123.com/file.pdf', 'http://www.123.com/pdfhello',
         'http://www.456.com/hello/one.file.pdf', 'http://www.123.com',
         'http://www.456.com/hello/one.file.pdf')

for url in paths:
    file = get_file(url)
    if is_pdf(file):
        print(file)

Doing this with preserving the absolute path with urllib and pathlib:

from pathlib import Path
from urllib.parse import urlparse


def converto_to_paths(urls):
    for url in urls:
        path = Path(urlparse(url).path)
        if path.suffix == ".pdf":
            yield path

urls = ('http://www.123.com/file.pdf', 'http://www.123.com/pdfhello',
         'http://www.456.com/hello/one.file.pdf', 'http://www.123.com',
         'http://www.456.com/hello/one.file.pdf')

paths = list(converto_to_paths(urls))
paths_filenames = [file.name for file in paths]
paths_filenames_stem = [file.stem for file in paths]

print(paths, paths_filenames, paths_filenames_stem, sep="\n\n")

Output:[PosixPath('/file.pdf'), PosixPath('/hello/one.file.pdf'), PosixPath('/hello/one.file.pdf')]

['file.pdf', 'one.file.pdf', 'one.file.pdf']

['file', 'one.file', 'one.file']

Depending on what you want later to do with your data, you can decide to use or not to use Path and urlparse.
The function urlparse() could also handle relative urls.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	WebDriverException: Message: 'PATH TO CHROME DRIVER' executable needs to be in PATH	Led_Zeppelin	1	2,250	Sep-09-2021, 01:25 PM Last Post: Yoriz
	.pth file does not show up in sys.path when configuring path.	arjunsingh2908	2	5,821	Jul-03-2018, 11:16 AM Last Post: arjunsingh2908
	scandir() recursively and return path + filename	malonn	6	17,410	May-09-2018, 03:45 PM Last Post: wavic

Get the filename from a path

User Panel Messages

Announcements