List of pathlib.Paths Not Ordered As Same List of Same String Filenames

QbLearningPython · (This post was last modified: Nov-11-2017, 05:54 PM by QbLearningPython.)

While testing a module, I have found a weird behaviour of pathlib package. I have a list of pathlib.Paths and I sorted() it. I supposed that the order retrieved by sorted() a list of Paths would be the same as the order retrieved by sorted() a list of their (string) filenames. But it is not the case.

Let me explain.

I have a list of filenames such as :

        
              filenames_for_testing = (
    '/spam/spams.txt',
    '/spam/spam.txt',
    '/spam/another.txt',
    '/spam/binary.bin',
    '/spam/spams/spam.ttt',
    '/spam/spams/spam01.txt',
    '/spam/spams/spam02.txt',
    '/spam/spams/spam03.ppp',
    '/spam/spams/spam04.doc',
)

If I run the following:

        
              sorted_filenames = sorted(filenames_for_testing)
print()
[print(element) for element in sorted_filenames]
print()

the alphabetical (string) order of this list will be:

/spam/another.txt
/spam/binary.bin
/spam/spam.txt
/spam/spams.txt
/spam/spams/spam.ttt
/spam/spams/spam01.txt
/spam/spams/spam02.txt
/spam/spams/spam03.ppp
/spam/spams/spam04.doc

But when I try to order the same list as pathlib.Paths using:

        
              from pathlib import Path
 
paths_for_testing = [
    Path(filename)
    for filename in filenames_for_testing
]
sorted_paths = sorted(paths_for_testing)

The list returned is (just showing filenames of the pathlib.Paths):

/spam/another.txt
/spam/binary.bin
/spam/spam.txt
/spam/spams/spam.ttt
/spam/spams/spam01.txt
/spam/spams/spam02.txt
/spam/spams/spam03.ppp
/spam/spams/spam04.doc
/spam/spams.txt

which is different from previous list because 'spam/spams.txt' does not go after '/spam/spam.txt' and before all '/spam/spams/*' files (instead, it goes at the end of the list).

You can check it using:

        
              sorted_filenames == [str(path) for path in sorted_paths]

which returns False.

I am not sure this would be a bug. Maybe it is the intended purpose. However, I think that it is a weird behaviour. Unless I am missing something, I can hardly understand why a list of pathlib.Paths and a list with the same string filenames can be ordered in the same fashion.

A crafted script to test this:

        
          
          
              
              from pathlib import Path
 
# order string filenames
 
filenames_for_testing = (
    '/spam/spams.txt',
    '/spam/spam.txt',
    '/spam/another.txt',
    '/spam/binary.bin',
    '/spam/spams/spam.ttt',
    '/spam/spams/spam01.txt',
    '/spam/spams/spam02.txt',
    '/spam/spams/spam03.ppp',
    '/spam/spams/spam04.doc',
)
 
sorted_filenames = sorted(filenames_for_testing)
 
# output ordered list of string filenames
 
print()
print("Ordered list of string filenames:")
print()
[print(f'\t{element}') for element in sorted_filenames]
print()
 
# order paths (build from same string filenames)
 
paths_for_testing = [
    Path(filename)
    for filename in filenames_for_testing
]
sorted_paths = sorted(paths_for_testing)
 
# output ordered list of pathlib.Paths
 
print()
print("Ordered list of pathlib.Paths:")
print()
[print(f'\t{element}'
       ) for element in sorted_paths]
print()
 
# compare
 
print()
 
if sorted_filenames == [str(path) for path in sorted_paths]:
    print('Ordered lists of string filenames and pathlib.Paths are EQUAL.')
     
else:
    print('Ordered lists of string filenames and pathlib.Paths are DIFFERENT.')
 
    for element in range(0, len(sorted_filenames)):
         
        if sorted_filenames[element] != str(sorted_paths[element]):
             
            print()
            print('First different element:')
            print(f'\tElement #{element}')
            print(f'\t{sorted_filenames[element]} != {sorted_paths[element]}')
            break
 
print()

            

        
      

I am running Python 3.6.3 on MacOs 10.12.6

Thanks.

heiner55 · (This post was last modified: Nov-11-2017, 06:34 PM by heiner55.)

Path(filename) is an object and not a string.
If you take the string instead, the sort is identical.

        
              paths_for_testing = [
    str(Path(filename))      # <== take str(object)
    for filename in filenames_for_testing
]

Windows 7 with Python 3.6.2:

Output:Ordered list of string filenames:

    /spam/another.txt
    /spam/binary.bin
    /spam/spam.txt
    /spam/spams.txt
    /spam/spams/spam.ttt
    /spam/spams/spam01.txt
    /spam/spams/spam02.txt
    /spam/spams/spam03.ppp
    /spam/spams/spam04.doc


Ordered list of pathlib.Paths:

    \spam\another.txt
    \spam\binary.bin
    \spam\spam.txt
    \spam\spams.txt
    \spam\spams\spam.ttt
    \spam\spams\spam01.txt
    \spam\spams\spam02.txt
    \spam\spams\spam03.ppp
    \spam\spams\spam04.doc

QbLearningPython · (This post was last modified: Nov-11-2017, 06:44 PM by QbLearningPython.)

Of course, pathlib.Path is an object, not a string.

But pathlib.Paths are (or should be) compare using their filenames (which are strings). (Roughly speaking: in reality, pathlib.Paths are compared in other way that I really do not understand watching Python's source code).

So I think ordering (or comparing) pathlib.Paths and ordering (or comparing) their string filenames should render the same result, not a different one.

**Larz60+** · (This post was last modified: Nov-11-2017, 07:57 PM by Larz60+.)

you should look at: https://pymotw.com/3/pathlib/
a pathlib path is much easier to construct if the path nodes are contained
within a list:

        
          
          
              
              from pathlib import Path
 
mylocation  = ['..', 'data', 'fipsCodes', 'GNIScodesForNamedPopulatedPlaces-etc', 'CountryNames', 'geonames_20171023', 'Countries.txt']
 
home = Path('.')
print('\n-- home --')
print(f'{home}')
 
print(f'{home.name}')
print(f'{home.resolve()}')
 
print('\n-- mydatapath --')
mydatapath = home.joinpath(*mylocation)
print(f'(\n{mydatapath}')
print(f'({mydatapath.name}')
print(f'{mydatapath.resolve()}')
 
print('\n-- newdatapath --')
# you can also create a path like
newdatapath = home / 'data'
print(f'\n{newdatapath}')
print(f'{newdatapath.name}')
print(f'{newdatapath.resolve()}')
 
print('\n-- filelist --')
filelist = [x.name for x in newdatapath.iterdir() if x.is_file()]
print(f'\n{filelist}')
 
print('\n-- opening files --')
fips_text_file = newdatapath / 'fips.txt'
 
with fips_text_file.open() as f:
    count = 0
    for line in f:
        line = line.strip()
        count += 1
        print(line)
        if count > 10:
            break

            

        
      

results (part of resolved path removed for security, replaced with ...):

Output:-- home --
.

... \Tiger\src

-- mydatapath --
(
..\data\fipsCodes\GNIScodesForNamedPopulatedPlaces-etc\CountryNames\geonames_20171023\Countries.txt
(Countries.txt
... \Tiger\data\fipsCodes\GNIScodesForNamedPopulatedPlaces-etc\CountryNames\geonames_20171023\Countries.txt

-- newdatapath --

data
data
... \Tiger\src\data

-- filelist --

['fips.json', 'fips.txt', 'fipsdata.db', 'fipsdataBackup.db', 'FIPSFormat.json', 'FIPSFormat.txt', 'GNIS_CountryFormat.json', 'GNIS_CountryFormat.txt', 'GNIS_DomesticFormat.json', 'GNIS_DomesticFormat.txt']

-- opening files --
{
"AmericanIndianAreas": {
"data": {
"0010": [
"0010",
"Acoma Pueblo and Off-Reservation Trust Land"
],
"0020": [
"0020",
"Agua Caliente Indian Reservation and Off-Reservation Trust Land"
],

QbLearningPython · (This post was last modified: Nov-11-2017, 08:15 PM by QbLearningPython.)

Sorry, Larz60+, but I can see which is the relationship between your answer and my question. Am I missing something? (I am asking this with all respect, of course. Just trying to learn).

heiner55 · (This post was last modified: Nov-11-2017, 08:26 PM by heiner55.)

My theory is:
Internally the path is not saved as a string but as a list of path components like Larz60 mentioned.
So when you sort Path objects, internally the lists are compared and not the strings.
This is faster than converting each time the list to a string.

**Larz60+** · Nov-11-2017, 09:05 PM

if the items in the tuple are truly pathlib objects, they must be resolved before sorting,
otherwise you are sorting the object addresses

QbLearningPython · Nov-11-2017, 09:07 PM

I see. I suspected something like that looking at source.

However, this behaviour can create some incongruences (tough, I conceal, in some few cases). It could be faster, but... is it really worth it?

If you need to convert "manually" Paths to strings to get a "proper" ordered list of pathlib.Paths (instead of making "automatically" on the package), I do not see real gain. I just see a point of incongruence, weird behaviour, and potential programmer's flaws.

Because I cannot fully understand the algorithm behind pathlib comparations, I cannot say —as you and Larz60+ pointed out— if it is really necessary to convert to a string for comparing internally two Paths. Is not it possible to get an alphabetical order (such as on strings) using internal lists on pathlib implementation itself?

Of course, this is a minor annoyance. I have been working with pathlib since its inception and it is the first time I have encountered this.

Thanks.

**Larz60+** · (This post was last modified: Nov-12-2017, 12:37 AM by Larz60+.)

All of the support that comes with pathlib is very well worth it.
I have been using it in applications and it saves a great deal of time.

Consider the following snippet of code:

        
            for key, entry in ffmt.items():
                filelist =
                filepath = self.fips.homepath.joinpath(*entry['location'])
                print(f'\n{filepath.resolve()}')
                if entry['filename'] == '..multi..':
                    filelist = [x for x in filepath.iterdir() if x.is_file()]
                else:
                    filelist.append(filepath)
                for file in filelist:
                    with file.open(encoding=encode) as f:
                        for rec in f:
                            fields = self.prepare_rec(rec.strip(), entry, gethead)

ffmt is a dictionary containing information on about twenty files. all located in separate
directories. Each of this dictionary's items nested dictionaries containing information pertaining
to each file. The sub dictionary, contains entries for file location, delimiter, and field information,
part of which is shown here:

        
                                  'location': ['..', 'data', 'fipsCodes', 'GNIScodesForNamedPopulatedPlaces-etc',
                                 'CountryNames', 'geonames_20171023', 'Countries.txt'],
                    'delim': '    ',

from which filepath can be constructed.
This entire structure allows for a very simple interface that is easy to understand, and does oh so much!
A descriptor dictionary such as this can be easily stored in a json file, for use by all programs in the
application.

In my book, pathlib is well worth the effort required to be comfortable with. Take a look at the docs,
here

heiner55 · Nov-12-2017, 01:42 PM

@Larz60+
Which is the name of your book and where can i find it ?

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	question about changing the string value of a list element	jacksfrustration	4	2,212	Feb-08-2025, 07:43 AM Last Post: jacksfrustration
	Problem with pathlib ?	Ota	3	1,425	Nov-11-2024, 06:04 AM Last Post: MoMoProxy
	extract an element of a list into a string	alexs	5	4,017	Aug-30-2024, 09:24 PM Last Post: alexs
	comtypes: how to provinde a list of string to a COM method	zalanthas	0	989	Jun-26-2024, 01:27 PM Last Post: zalanthas
	Strange behavior list of list	mmhmjanssen	3	1,668	May-09-2024, 11:32 AM Last Post: mmhmjanssen
	Next/Prev file without loading all filenames	WilliamKappler	9	3,522	Apr-12-2024, 05:13 AM Last Post: Pedroski55
	Sample random, unique string pairs from a list without repetitions	walterwhite	1	1,986	Nov-19-2023, 10:07 PM Last Post: deanhystad
	trouble reading string/module from excel as a list	popular_dog	0	955	Oct-04-2023, 01:07 PM Last Post: popular_dog
	No matter what I do I get back "List indices must be integers or slices, not list"	Radical	4	2,680	Sep-24-2023, 05:03 AM Last Post: deanhystad
	String to List question help	James_Thomas	6	2,555	Sep-06-2023, 02:32 PM Last Post: deanhystad

List of pathlib.Paths Not Ordered As Same List of Same String Filenames

User Panel Messages

Announcements