Python Forum

Full Version: sorting with numbers in text
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
back when i coded in C, i implement a sort function that worked with key strings (C style) that could have a common pattern of numbers within each string, where the number of digits in each number could be different, and it would compare the keys with numbers compared by numeric value. i would like to do the same thing in Python using its sort key handler system. before i start coding this, does anyone know of an existing function to do this? if not, i'll end up doing my own.
Bruteforce-Method:

import re


def number_tuple(text: str) -> tuple[int]:
    """
    Return number pairs
    """
    return tuple(int(val) for val in re.split(r"\D+", text) if val)


items = ["AA33_43", "A", "A23", "B55", "1_1_4"]
sorted(items, key=number_tuple)
Output:
['A', '1_1_4', 'A23', 'AA33_43', 'B55']
A returns an empty tuple, so it's the first item.
(1, 1, 4) < (23,)
(23,) < (33, 42)
(33, 43) < (55,)
(Jul-17-2021, 11:48 PM)Skaperen Wrote: [ -> ]before i start coding this, does anyone know of an existing function to do this?
Yes the problem is often called Human or Natural Sorting,
write a own function like DeaD_EyE is a common way,there also module like natsort
natsor works for many common cases.
>>> from natsort import natsorted
>>> 
>>> versions = ['version-1.9', 'version-2.0', 'version-1.11', 'version-1.10']
>>> sorted(versions)
['version-1.10', 'version-1.11', 'version-1.9', 'version-2.0']
>>> natsorted(versions)
['version-1.9', 'version-1.10', 'version-1.11', 'version-2.0']
Human Sorting where non-numeric characters are also ordered based on their meaning, not on their ordinal value.
Using the local decimal separator most common en_US.UTF-8.
>>> from natsort import humansorted
>>> 
>>> fruits = ['apple', 'apple1,4', 'Banana9,6', 'apple1,0', 'banana7,7']
>>> sorted(fruits)
['Banana9,6', 'apple', 'apple1,0', 'apple1,4', 'banana7,7']
>>> natsorted(fruits)
['Banana9,6', 'apple', 'apple1,0', 'apple1,4', 'banana7,7']
>>> 
>>> humansorted(fruits)
['apple', 'apple1,0', 'apple1,4', 'banana7,7', 'Banana9,6']
nice!

what i did in C was just deal with the numbers. they were compared in such a way that the shorted number was prefixed by enough '0' characters to make it be the same length as the longer one. it was done as part of the actual character-by-character comparison. that would be the wrong way to do it in Python and i figured the right way was probably already done (so i didn't want to re-invent another wheel). i need to spend more time with module re so i quickly envision how it can solve many problems like this.
next:

tobesorted = ['one trillion seven hundred million two','three hundred billion fifty five thousand twelve']
i don't want to sort the numbers, exclusively. i want to sort what is between them ... ASCIIbetically for now ... too.
Output:
alpha2 alpha1234bar alpha1234foo beta gamma xyzzy delta virus