Jul-08-2020, 07:52 AM
I haven't looked into the code of natsort. They've could implement a key function like this:
The re.split splits all numbers from the rest.
The parenthesis around \d+ is to capture this. Otherwise, you'll get
Each character has a code point, which you get with
Comparing mixed types in a tuple is not possible. The resulting
A string has many methods to do checks. For example, you can check if the
In the example function
If the
Applying the function to your example:
The rest is identical. Sorting this tuples now:
The files in a directory are sorted by natkey.
import re # use instead this package # https://pypi.org/project/natsort/ def natkey(text): result = [] for element in re.split(r"(\d+)", text): if element.isdecimal(): result.append(int(element)) else: result.extend(map(ord, element)) return tuple(result)
The re.split splits all numbers from the rest.
The parenthesis around \d+ is to capture this. Otherwise, you'll get
None
if there was a decimal.Each character has a code point, which you get with
ord()
.A = ord("A") a = ord("a") print(A, hex(A), sep=", ") print(a, hex(a), sep=", ")Just sorting a text, is done by lexicographical order. Usually, a string consists more than one element.
Comparing mixed types in a tuple is not possible. The resulting
tuple
must have only int
as elements (or another data type, which is comparable). To convert a string into a tuple of code points:greeting = "Greetings and salvation." result = tuple(map(ord, greeting)) print(result)Then you get a tuple with numbers back:
Output:(71, 114, 101, 101, 116, 105, 110, 103, 115, 32, 97, 110, 100, 32, 115, 97, 108, 118, 97, 116, 105, 111, 110, 46)
Now the problem is, that you want to convert numbers in the str
to the datatype int
.A string has many methods to do checks. For example, you can check if the
str
only consists of numbers: str.isdecimal
. There are much more methods.In the example function
natkey
I convert the str
to an int
with the built-in function int()
. This is appended to the list.If the
str
is non-decimal the else-block is executed. The method list.extend()
takes an iterable
and extends the list with the elements form iterable
.Applying the function to your example:
In [37]: for t in ('CE3_1_page_29.pdf', 'CE3_1_page_28.pdf'): ...: print(natkey(t)) ...: (67, 69, 3, 95, 1, 95, 112, 97, 103, 101, 95, 29, 46, 112, 100, 102) (67, 69, 3, 95, 1, 95, 112, 97, 103, 101, 95, 28, 46, 112, 100, 102)In the first result is the number
29
and in the second the number 28
.The rest is identical. Sorting this tuples now:
In [38]: tuple1 = natkey('CE3_1_page_29.pdf') ...: tuple2 = natkey('CE3_1_page_28.pdf') ...: sorted([tuple1, tuple2]) Out[38]: [(67, 69, 3, 95, 1, 95, 112, 97, 103, 101, 95, 28, 46, 112, 100, 102), (67, 69, 3, 95, 1, 95, 112, 97, 103, 101, 95, 29, 46, 112, 100, 102)]The 28 is smaller and comes first. You can reverse the order.
In [39]: tuple1 = natkey('CE3_1_page_29.pdf') ...: tuple2 = natkey('CE3_1_page_28.pdf') ...: sorted([tuple1, tuple2], reverse=True) Out[39]: [(67, 69, 3, 95, 1, 95, 112, 97, 103, 101, 95, 29, 46, 112, 100, 102), (67, 69, 3, 95, 1, 95, 112, 97, 103, 101, 95, 28, 46, 112, 100, 102)]Finally you apply the function to your filenames:
result = sorted(['CE3_1_page_29.pdf', 'CE3_1_page_41.pdf', 'CE3_1_page_28.pdf', 'CE3_11_page_14.pdf'], key=natkey) print(result)So instead let doing sorted the work to create the key for comparison, you use your own key-function, which retuns this tuples (they can also be lists).
Output:['CE3_1_page_28.pdf', 'CE3_1_page_29.pdf', 'CE3_1_page_41.pdf', 'CE3_11_page_14.pdf']
Later in your code you do something like this:import os for root, dirs, files in os.walk("."): for file in sorted(files, key=natkey): # files are sorted in memory if file.endswith(".pdf"): print(os.path.join(root, file))My output:
Output:.\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\202.5792.43\help\ReferenceCard.pdf
.\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\202.5792.43\help\ReferenceCardForMac.pdf
.\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\202.6109.24\help\ReferenceCard.pdf
.\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\202.6109.24\help\ReferenceCardForMac.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\back.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\filesave.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\forward.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\hand.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\help.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\home.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\matplotlib.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\move.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\qt4_editor_options.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\subplots.pdf
.\AppData\Local\Programs\Python\Python38\Lib\site-packages\matplotlib\mpl-data\images\zoom_to_rect.pdf
The .
comes from os.walk. It's a relative path. You can also use absolute paths.The files in a directory are sorted by natkey.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
All humans together. We don't need politicians!