Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
how to do a numeric sort
#1
i think all that is needed is a sort key function for this. i want to have a list of files sorted in a numeric way so that:
Output:
python-3.4.0-foo.gz python-3.4.1-foo.gz python-3.4.10-foo.gz python-3.4.2-foo.gz python-3.4.3-foo.gz python-3.4.4-foo.gz python-3.4.5-foo.gz python-3.4.6-foo.gz python-3.4.7-foo.gz python-3.4.8-foo.gz python-3.4.9-foo.gz python-3.5.0-foo.gz python-3.5.1-foo.gz python-3.5.2-foo.gz python-3.5.3-foo.gz python-3.5.4-foo.gz python-3.5.5-foo.gz python-3.5.6-foo.gz python-3.5.7-foo.gz
gets sorted like:
Output:
python-3.4.0-foo.gz python-3.4.1-foo.gz python-3.4.2-foo.gz python-3.4.3-foo.gz python-3.4.4-foo.gz python-3.4.5-foo.gz python-3.4.6-foo.gz python-3.4.7-foo.gz python-3.4.8-foo.gz python-3.4.9-foo.gz python-3.4.10-foo.gz python-3.5.0-foo.gz python-3.5.1-foo.gz python-3.5.2-foo.gz python-3.5.3-foo.gz python-3.5.4-foo.gz python-3.5.5-foo.gz python-3.5.6-foo.gz python-3.5.7-foo.gz
anyone know of code that can do this?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
Something like this:

lst = ['python-3.4.0-foo.gz',
       'python-3.4.1-foo.gz',
       'python-3.4.10-foo.gz',
       'python-3.4.2-foo.gz',
       'python-3.4.3-foo.gz',
       'python-3.4.4-foo.gz',
       'python-3.4.5-foo.gz',
       'python-3.4.6-foo.gz',
       'python-3.4.7-foo.gz',
       'python-3.4.8-foo.gz',
       'python-3.4.9-foo.gz',
       'python-3.5.0-foo.gz',
       'python-3.5.1-foo.gz',
       'python-3.5.2-foo.gz',
       'python-3.5.3-foo.gz',
       'python-3.5.4-foo.gz',
       'python-3.5.5-foo.gz',
       'python-3.5.6-foo.gz',
       'python-3.5.7-foo.gz']

def by_version(row):
    return [int(num) for num in row.split('-')[1].split('.')]

sorted(lst, key=by_version)
Output:
['python-3.4.0-foo.gz', 'python-3.4.1-foo.gz', 'python-3.4.2-foo.gz', 'python-3.4.3-foo.gz', 'python-3.4.4-foo.gz', 'python-3.4.5-foo.gz', 'python-3.4.6-foo.gz', 'python-3.4.7-foo.gz', 'python-3.4.8-foo.gz', 'python-3.4.9-foo.gz', 'python-3.4.10-foo.gz', 'python-3.5.0-foo.gz', 'python-3.5.1-foo.gz', 'python-3.5.2-foo.gz', 'python-3.5.3-foo.gz', 'python-3.5.4-foo.gz', 'python-3.5.5-foo.gz', 'python-3.5.6-foo.gz', 'python-3.5.7-foo.gz']
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#3
Below is not an universal approach, but it might be useful:
def key(x):
    vals = map(int, x.split('.'))
    vv = [1000, 50, 1]
    return sum(x * y for x, y in zip(vv, vals))

sorted(['3.1.3', '3.1.1', '3.1.9', '3.2.1', '3.1.10'], key=key)
Output:
['3.1.1', '3.1.3', '3.1.9', '3.1.10', '3.2.1']
Reply
#4
i'm trying these and others but running into problems because it some cases i am trying to sort there are other character values bordering the digits. so to use perfringo's code, i'd need a more magical split().
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#5
(Jul-10-2019, 05:02 AM)Skaperen Wrote: i'm trying these and others but running into problems because it some cases i am trying to sort there are other character values bordering the digits. so to use perfringo's code, i'd need a more magical split().

From which side? Are these all 3.x.x versions? If so and other characters are before version number one can split on number 3 as it's redundant.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#6
the list of Python docs files in the 3.4.x and 3.5.x ranges was just an example case. any non-decimal character value can be adjacent to the decimal digits. any run of a class (decimal digits vs. not) can be of any length.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#7
Then one can use groupby:

from itertools import groupby

lst = ['python3.4.0-foo.gz',
       'python2.4.1-foo.gz',
       'python-3.4.10foo.gz',
       'python-3.4.2-foo.gz',
       'python-3.4.3-foo.gz',
       'python-3.4.4-foo.gz',
       'python-3.4.5-foo.gz',
       'python-3.4.6-foo.gz',
       'python-3.4.7-foo.gz',
       'python-3.4.8-foo.gz',
       'python-3.4.9-foo.gz',
       'python-3.5.0-foo.gz',
       'python-3.5.1-foo.gz',
       'python-3.5.2-foo.gz',
       'python-3.5.3-foo.gz',
       'python-3.5.4-foo.gz',
       'python-3.5.5-foo.gz',
       'python-3.5.6-foo.gz',
       'python-3.5.7-foo.gz']

def versions(row):
    return [int(''.join(x for x in group)) for nums, group in groupby(row, lambda x: x.isdigit()) if nums]

sorted(lst, key=versions)
Output:
['python2.4.1-foo.gz', 'python3.4.0-foo.gz', 'python-3.4.2-foo.gz', 'python-3.4.3-foo.gz', 'python-3.4.4-foo.gz', 'python-3.4.5-foo.gz', 'python-3.4.6-foo.gz', 'python-3.4.7-foo.gz', 'python-3.4.8-foo.gz', 'python-3.4.9-foo.gz', 'python-3.4.10foo.gz', 'python-3.5.0-foo.gz', 'python-3.5.1-foo.gz', 'python-3.5.2-foo.gz', 'python-3.5.3-foo.gz', 'python-3.5.4-foo.gz', 'python-3.5.5-foo.gz', 'python-3.5.6-foo.gz', 'python-3.5.7-foo.gz']
EDIT:

Actually no need for comprehension in .join, one can use ''.join(group)
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#8
There are also modules for this on pypi, such as this one. It could be worth trying.
Reply
#9
that looks like a good module to have around.

i ended up implementing the numeric sort, for sorting files, this way:

my script acts as a filter, reading stdin then printing the result to stdout.

this filter script defines a special code character as Unicode U+FFF6, very unlikely to be in any text data. if the input line has this code in it, a single split around it is done and the part after the code is printed to stdout. else (if the line does not have the code in it) it does re.split(r'(\d+)',line). then it scans that list for any elements that are .isdecimal(). strings that are .isdecimal() are padded with enough leading zeros to make it a wide fixed width number. the list is then joined back to a single string. the modified string + the code + the original string are then printed to stdout. this filter will undo what it does when it processes its own result. so a Unix shell pipeline like filter|sort|filter performs the numeric sort i want. and it avoids storing the whole file in memory because sometimes i might have millions of lines to sort.



#!/usr/bin/env python3
from re import split
from sys import stdin
code = chr(0xfff6)
width = 25
regexp = r'(\d+)'
pad = '0'*width
for line in stdin:
    line = line.split('\n',1)[0]
    if not line:
        print()
        continue
    if code in line:
        line = line.split(code,1)[1]
        print(line)
        continue
    seq = split(regexp,line)
    for x in range(len(seq)):
        if seq[x].isdecimal():
            seq[x] = (pad+seq[x])[-width:]
    seq.append(code)
    seq.append(line)
    print(''.join(seq))
exit(0)
no, i didn't want to use line.rstrip(). i didn't want to lose any trailing spaces.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#10
Wouldn't sort --version-sort suit your needs?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Numeric Enigma Machine idev 9 514 Mar-29-2024, 06:15 PM
Last Post: idev
Photo a.sort() == b.sort() all the time 3lnyn0 1 1,320 Apr-19-2022, 06:50 PM
Last Post: Gribouillis
Question Numeric Anagrams - Count Occurances monty024 2 1,511 Nov-13-2021, 05:05 PM
Last Post: monty024
  How to get datetime from numeric format field klllmmm 3 2,004 Nov-06-2021, 03:26 PM
Last Post: snippsat
  Extract continuous numeric characters from a string in Python Robotguy 2 2,643 Jan-16-2021, 12:44 AM
Last Post: snippsat
  How to calculate column mean and row skip non numeric and na Mekala 5 4,949 May-06-2020, 10:52 AM
Last Post: anbu23
  Alpha numeric element list search rhubarbpieguy 1 1,786 Apr-01-2020, 12:41 PM
Last Post: pyzyx3qwerty
  convert a character to numeric and back Skaperen 2 2,108 Jan-28-2020, 09:32 PM
Last Post: Skaperen
  are numeric types passed by value or reference? rudihammad 4 2,623 Nov-19-2019, 06:25 AM
Last Post: rudihammad
  'Age' categorical (years -months -days ) to numeric Smiling29 4 2,931 Oct-17-2019, 05:26 PM
Last Post: Smiling29

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020