Python Forum
sorting with numbers in text
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
sorting with numbers in text
#1
back when i coded in C, i implement a sort function that worked with key strings (C style) that could have a common pattern of numbers within each string, where the number of digits in each number could be different, and it would compare the keys with numbers compared by numeric value. i would like to do the same thing in Python using its sort key handler system. before i start coding this, does anyone know of an existing function to do this? if not, i'll end up doing my own.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
Bruteforce-Method:

import re


def number_tuple(text: str) -> tuple[int]:
    """
    Return number pairs
    """
    return tuple(int(val) for val in re.split(r"\D+", text) if val)


items = ["AA33_43", "A", "A23", "B55", "1_1_4"]
sorted(items, key=number_tuple)
Output:
['A', '1_1_4', 'A23', 'AA33_43', 'B55']
A returns an empty tuple, so it's the first item.
(1, 1, 4) < (23,)
(23,) < (33, 42)
(33, 43) < (55,)
snippsat and Skaperen like this post
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#3
(Jul-17-2021, 11:48 PM)Skaperen Wrote: before i start coding this, does anyone know of an existing function to do this?
Yes the problem is often called Human or Natural Sorting,
write a own function like DeaD_EyE is a common way,there also module like natsort
natsor works for many common cases.
>>> from natsort import natsorted
>>> 
>>> versions = ['version-1.9', 'version-2.0', 'version-1.11', 'version-1.10']
>>> sorted(versions)
['version-1.10', 'version-1.11', 'version-1.9', 'version-2.0']
>>> natsorted(versions)
['version-1.9', 'version-1.10', 'version-1.11', 'version-2.0']
Human Sorting where non-numeric characters are also ordered based on their meaning, not on their ordinal value.
Using the local decimal separator most common en_US.UTF-8.
>>> from natsort import humansorted
>>> 
>>> fruits = ['apple', 'apple1,4', 'Banana9,6', 'apple1,0', 'banana7,7']
>>> sorted(fruits)
['Banana9,6', 'apple', 'apple1,0', 'apple1,4', 'banana7,7']
>>> natsorted(fruits)
['Banana9,6', 'apple', 'apple1,0', 'apple1,4', 'banana7,7']
>>> 
>>> humansorted(fruits)
['apple', 'apple1,0', 'apple1,4', 'banana7,7', 'Banana9,6']
Yoriz and Skaperen like this post
Reply
#4
nice!

what i did in C was just deal with the numbers. they were compared in such a way that the shorted number was prefixed by enough '0' characters to make it be the same length as the longer one. it was done as part of the actual character-by-character comparison. that would be the wrong way to do it in Python and i figured the right way was probably already done (so i didn't want to re-invent another wheel). i need to spend more time with module re so i quickly envision how it can solve many problems like this.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#5
next:

tobesorted = ['one trillion seven hundred million two','three hundred billion fifty five thousand twelve']
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#6
i don't want to sort the numbers, exclusively. i want to sort what is between them ... ASCIIbetically for now ... too.
Output:
alpha2 alpha1234bar alpha1234foo beta gamma xyzzy delta virus
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  reformatting text with comma separated numbers Skaperen 4 2,575 May-07-2020, 06:14 AM
Last Post: anbu23

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020