I haven't looked into the code of natsort. They've could implement a key function like this:
import re
# use instead this package
# https://pypi.org/project/natsort/
def natkey(text):
result = []
for element in re.split(r"(\d+)", text):
if element.isdecimal():
result.append(int(element))
else:
result.extend(map(ord, element))
return tuple(result)
The re.split splits all numbers from the rest.
The parenthesis around \d+ is to capture this. Otherwise, you'll get
None
if there was a decimal.
Each character has a code point, which you get with
ord()
.
A = ord("A")
a = ord("a")
print(A, hex(A), sep=", ")
print(a, hex(a), sep=", ")
Just sorting a text, is done by lexicographical order. Usually, a string consists more than one element.
Comparing mixed types in a tuple is not possible. The resulting
tuple
must have only
int
as elements (or another data type, which is comparable). To convert a string into a tuple of code points:
greeting = "Greetings and salvation."
result = tuple(map(ord, greeting))
print(result)
Then you get a tuple with numbers back:
Output:
(71, 114, 101, 101, 116, 105, 110, 103, 115, 32, 97, 110, 100, 32, 115, 97, 108, 118, 97, 116, 105, 111, 110, 46)
Now the problem is, that you want to convert numbers in the
str
to the datatype
int
.
A string has many methods to do checks. For example, you can check if the
str
only consists of numbers:
str.isdecimal
. There are much more methods.
In the example function
natkey
I convert the
str
to an
int
with the built-in function
int()
. This is appended to the list.
If the
str
is non-decimal the else-block is executed. The method
list.extend()
takes an
iterable
and extends the list with the elements form
iterable
.
Applying the function to your example:
In [37]: for t in ('CE3_1_page_29.pdf', 'CE3_1_page_28.pdf'):
...: print(natkey(t))
...:
(67, 69, 3, 95, 1, 95, 112, 97, 103, 101, 95, 29, 46, 112, 100, 102)
(67, 69, 3, 95, 1, 95, 112, 97, 103, 101, 95, 28, 46, 112, 100, 102)
In the first result is the number
29
and in the second the number
28
.
The rest is identical. Sorting this tuples now:
In [38]: tuple1 = natkey('CE3_1_page_29.pdf')
...: tuple2 = natkey('CE3_1_page_28.pdf')
...: sorted([tuple1, tuple2])
Out[38]:
[(67, 69, 3, 95, 1, 95, 112, 97, 103, 101, 95, 28, 46, 112, 100, 102),
(67, 69, 3, 95, 1, 95, 112, 97, 103, 101, 95, 29, 46, 112, 100, 102)]
The 28 is smaller and comes first. You can reverse the order.
In [39]: tuple1 = natkey('CE3_1_page_29.pdf')
...: tuple2 = natkey('CE3_1_page_28.pdf')
...: sorted([tuple1, tuple2], reverse=True)
Out[39]:
[(67, 69, 3, 95, 1, 95, 112, 97, 103, 101, 95, 29, 46, 112, 100, 102),
(67, 69, 3, 95, 1, 95, 112, 97, 103, 101, 95, 28, 46, 112, 100, 102)]
Finally you apply the function to your filenames:
result = sorted(['CE3_1_page_29.pdf', 'CE3_1_page_41.pdf', 'CE3_1_page_28.pdf', 'CE3_11_page_14.pdf'], key=natkey)
print(result)
So instead let doing sorted the work to create the key for comparison, you use your own key-function, which retuns this tuples (they can also be lists).
Output:
['CE3_1_page_28.pdf', 'CE3_1_page_29.pdf', 'CE3_1_page_41.pdf', 'CE3_11_page_14.pdf']
Later in your code you do something like this:
import os
for root, dirs, files in os.walk("."):
for file in sorted(files, key=natkey):
# files are sorted in memory
if file.endswith(".pdf"):
print(os.path.join(root, file))
My output:
Output:
.\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\202.5792.43\help\ReferenceCard.pdf
.\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\202.5792.43\help\ReferenceCardForMac.pdf
.\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\202.6109.24\help\ReferenceCard.pdf
.\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\202.6109.24\help\ReferenceCardForMac.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\back.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\filesave.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\forward.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\hand.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\help.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\home.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\matplotlib.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\move.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\qt4_editor_options.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\subplots.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\zoom_to_rect.pdf
The
.
comes from os.walk. It's a relative path. You can also use absolute paths.
The files in a directory are sorted by natkey.