Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
A more intelligent .sort()?
#5
I haven't looked into the code of natsort. They've could implement a key function like this:

import re

# use instead this package
# https://pypi.org/project/natsort/

def natkey(text):
    result = []
    for element in re.split(r"(\d+)", text):
        if element.isdecimal():
            result.append(int(element))
        else:
            result.extend(map(ord, element))
    return tuple(result)
 

The re.split splits all numbers from the rest.
The parenthesis around \d+ is to capture this. Otherwise, you'll get None if there was a decimal.
Each character has a code point, which you get with ord().

A = ord("A")
a = ord("a")
print(A, hex(A), sep=", ")
print(a, hex(a), sep=", ")
Just sorting a text, is done by lexicographical order. Usually, a string consists more than one element.
Comparing mixed types in a tuple is not possible. The resulting tuple must have only int as elements (or another data type, which is comparable). To convert a string into a tuple of code points:

greeting = "Greetings and salvation."
result = tuple(map(ord, greeting))
print(result)
Then you get a tuple with numbers back:
Output:
(71, 114, 101, 101, 116, 105, 110, 103, 115, 32, 97, 110, 100, 32, 115, 97, 108, 118, 97, 116, 105, 111, 110, 46)
Now the problem is, that you want to convert numbers in the str to the datatype int.
A string has many methods to do checks. For example, you can check if the str only consists of numbers: str.isdecimal. There are much more methods.

In the example function natkey I convert the str to an int with the built-in function int(). This is appended to the list.

If the str is non-decimal the else-block is executed. The method list.extend() takes an iterable and extends the list with the elements form iterable.

Applying the function to your example:
In [37]: for t in ('CE3_1_page_29.pdf', 'CE3_1_page_28.pdf'):
    ...:     print(natkey(t))
    ...:
(67, 69, 3, 95, 1, 95, 112, 97, 103, 101, 95, 29, 46, 112, 100, 102)
(67, 69, 3, 95, 1, 95, 112, 97, 103, 101, 95, 28, 46, 112, 100, 102)
In the first result is the number 29 and in the second the number 28.
The rest is identical. Sorting this tuples now:

In [38]: tuple1 = natkey('CE3_1_page_29.pdf')
    ...: tuple2 = natkey('CE3_1_page_28.pdf')
    ...: sorted([tuple1, tuple2])
Out[38]:
[(67, 69, 3, 95, 1, 95, 112, 97, 103, 101, 95, 28, 46, 112, 100, 102),
 (67, 69, 3, 95, 1, 95, 112, 97, 103, 101, 95, 29, 46, 112, 100, 102)]
The 28 is smaller and comes first. You can reverse the order.

In [39]: tuple1 = natkey('CE3_1_page_29.pdf')
    ...: tuple2 = natkey('CE3_1_page_28.pdf')
    ...: sorted([tuple1, tuple2], reverse=True)
Out[39]:
[(67, 69, 3, 95, 1, 95, 112, 97, 103, 101, 95, 29, 46, 112, 100, 102),
 (67, 69, 3, 95, 1, 95, 112, 97, 103, 101, 95, 28, 46, 112, 100, 102)]
Finally you apply the function to your filenames:

result = sorted(['CE3_1_page_29.pdf', 'CE3_1_page_41.pdf', 'CE3_1_page_28.pdf', 'CE3_11_page_14.pdf'], key=natkey)
print(result)
So instead let doing sorted the work to create the key for comparison, you use your own key-function, which retuns this tuples (they can also be lists).

Output:
['CE3_1_page_28.pdf', 'CE3_1_page_29.pdf', 'CE3_1_page_41.pdf', 'CE3_11_page_14.pdf']
Later in your code you do something like this:

import os


for root, dirs, files in os.walk("."):
    for file in sorted(files, key=natkey):
        # files are sorted in memory
        if file.endswith(".pdf"):
            print(os.path.join(root, file))
My output:
Output:
.\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\202.5792.43\help\ReferenceCard.pdf .\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\202.5792.43\help\ReferenceCardForMac.pdf .\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\202.6109.24\help\ReferenceCard.pdf .\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\202.6109.24\help\ReferenceCardForMac.pdf .\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\back.pdf .\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\filesave.pdf .\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\forward.pdf .\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\hand.pdf .\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\help.pdf .\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\home.pdf .\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\matplotlib.pdf .\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\move.pdf .\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\qt4_editor_options.pdf .\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\subplots.pdf .\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\zoom_to_rect.pdf
The . comes from os.walk. It's a relative path. You can also use absolute paths.
The files in a directory are sorted by natkey.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Messages In This Thread
A more intelligent .sort()? - by Pedroski55 - Jul-07-2020, 09:38 AM
RE: A more intelligent .sort()? - by DeaD_EyE - Jul-07-2020, 11:11 AM
RE: A more intelligent .sort()? - by snippsat - Jul-07-2020, 05:47 PM
RE: A more intelligent .sort()? - by Pedroski55 - Jul-07-2020, 10:49 PM
RE: A more intelligent .sort()? - by DeaD_EyE - Jul-08-2020, 07:52 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
Photo a.sort() == b.sort() all the time 3lnyn0 1 1,345 Apr-19-2022, 06:50 PM
Last Post: Gribouillis
  some ideas for intelligent list splitting? wardancer84 4 3,235 Nov-20-2018, 02:47 PM
Last Post: DeaD_EyE

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020