Posts: 4,655
Threads: 1,498
Joined: Sep 2016
i think all that is needed is a sort key function for this. i want to have a list of files sorted in a numeric way so that: Output: python-3.4.0-foo.gz
python-3.4.1-foo.gz
python-3.4.10-foo.gz
python-3.4.2-foo.gz
python-3.4.3-foo.gz
python-3.4.4-foo.gz
python-3.4.5-foo.gz
python-3.4.6-foo.gz
python-3.4.7-foo.gz
python-3.4.8-foo.gz
python-3.4.9-foo.gz
python-3.5.0-foo.gz
python-3.5.1-foo.gz
python-3.5.2-foo.gz
python-3.5.3-foo.gz
python-3.5.4-foo.gz
python-3.5.5-foo.gz
python-3.5.6-foo.gz
python-3.5.7-foo.gz
gets sorted like: Output: python-3.4.0-foo.gz
python-3.4.1-foo.gz
python-3.4.2-foo.gz
python-3.4.3-foo.gz
python-3.4.4-foo.gz
python-3.4.5-foo.gz
python-3.4.6-foo.gz
python-3.4.7-foo.gz
python-3.4.8-foo.gz
python-3.4.9-foo.gz
python-3.4.10-foo.gz
python-3.5.0-foo.gz
python-3.5.1-foo.gz
python-3.5.2-foo.gz
python-3.5.3-foo.gz
python-3.5.4-foo.gz
python-3.5.5-foo.gz
python-3.5.6-foo.gz
python-3.5.7-foo.gz
anyone know of code that can do this?
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 1,950
Threads: 8
Joined: Jun 2018
Something like this:
lst = ['python-3.4.0-foo.gz',
'python-3.4.1-foo.gz',
'python-3.4.10-foo.gz',
'python-3.4.2-foo.gz',
'python-3.4.3-foo.gz',
'python-3.4.4-foo.gz',
'python-3.4.5-foo.gz',
'python-3.4.6-foo.gz',
'python-3.4.7-foo.gz',
'python-3.4.8-foo.gz',
'python-3.4.9-foo.gz',
'python-3.5.0-foo.gz',
'python-3.5.1-foo.gz',
'python-3.5.2-foo.gz',
'python-3.5.3-foo.gz',
'python-3.5.4-foo.gz',
'python-3.5.5-foo.gz',
'python-3.5.6-foo.gz',
'python-3.5.7-foo.gz']
def by_version(row):
return [int(num) for num in row.split('-')[1].split('.')]
sorted(lst, key=by_version) Output: ['python-3.4.0-foo.gz', 'python-3.4.1-foo.gz', 'python-3.4.2-foo.gz', 'python-3.4.3-foo.gz', 'python-3.4.4-foo.gz', 'python-3.4.5-foo.gz', 'python-3.4.6-foo.gz', 'python-3.4.7-foo.gz', 'python-3.4.8-foo.gz', 'python-3.4.9-foo.gz', 'python-3.4.10-foo.gz', 'python-3.5.0-foo.gz', 'python-3.5.1-foo.gz', 'python-3.5.2-foo.gz', 'python-3.5.3-foo.gz', 'python-3.5.4-foo.gz', 'python-3.5.5-foo.gz', 'python-3.5.6-foo.gz', 'python-3.5.7-foo.gz']
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Posts: 817
Threads: 1
Joined: Mar 2018
Below is not an universal approach, but it might be useful:
def key(x):
vals = map(int, x.split('.'))
vv = [1000, 50, 1]
return sum(x * y for x, y in zip(vv, vals))
sorted(['3.1.3', '3.1.1', '3.1.9', '3.2.1', '3.1.10'], key=key) Output: ['3.1.1', '3.1.3', '3.1.9', '3.1.10', '3.2.1']
Posts: 4,655
Threads: 1,498
Joined: Sep 2016
i'm trying these and others but running into problems because it some cases i am trying to sort there are other character values bordering the digits. so to use perfringo's code, i'd need a more magical split().
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 1,950
Threads: 8
Joined: Jun 2018
(Jul-10-2019, 05:02 AM)Skaperen Wrote: i'm trying these and others but running into problems because it some cases i am trying to sort there are other character values bordering the digits. so to use perfringo's code, i'd need a more magical split().
From which side? Are these all 3.x.x versions? If so and other characters are before version number one can split on number 3 as it's redundant.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Posts: 4,655
Threads: 1,498
Joined: Sep 2016
the list of Python docs files in the 3.4.x and 3.5.x ranges was just an example case. any non-decimal character value can be adjacent to the decimal digits. any run of a class (decimal digits vs. not) can be of any length.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 1,950
Threads: 8
Joined: Jun 2018
Jul-10-2019, 07:18 AM
(This post was last modified: Jul-10-2019, 07:18 AM by perfringo.)
Then one can use groupby:
from itertools import groupby
lst = ['python3.4.0-foo.gz',
'python2.4.1-foo.gz',
'python-3.4.10foo.gz',
'python-3.4.2-foo.gz',
'python-3.4.3-foo.gz',
'python-3.4.4-foo.gz',
'python-3.4.5-foo.gz',
'python-3.4.6-foo.gz',
'python-3.4.7-foo.gz',
'python-3.4.8-foo.gz',
'python-3.4.9-foo.gz',
'python-3.5.0-foo.gz',
'python-3.5.1-foo.gz',
'python-3.5.2-foo.gz',
'python-3.5.3-foo.gz',
'python-3.5.4-foo.gz',
'python-3.5.5-foo.gz',
'python-3.5.6-foo.gz',
'python-3.5.7-foo.gz']
def versions(row):
return [int(''.join(x for x in group)) for nums, group in groupby(row, lambda x: x.isdigit()) if nums]
sorted(lst, key=versions) Output: ['python2.4.1-foo.gz', 'python3.4.0-foo.gz', 'python-3.4.2-foo.gz', 'python-3.4.3-foo.gz', 'python-3.4.4-foo.gz', 'python-3.4.5-foo.gz', 'python-3.4.6-foo.gz', 'python-3.4.7-foo.gz', 'python-3.4.8-foo.gz', 'python-3.4.9-foo.gz', 'python-3.4.10foo.gz', 'python-3.5.0-foo.gz', 'python-3.5.1-foo.gz', 'python-3.5.2-foo.gz', 'python-3.5.3-foo.gz', 'python-3.5.4-foo.gz', 'python-3.5.5-foo.gz', 'python-3.5.6-foo.gz', 'python-3.5.7-foo.gz']
EDIT:
Actually no need for comprehension in .join, one can use ''.join(group)
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Posts: 4,807
Threads: 77
Joined: Jan 2018
Jul-10-2019, 07:27 AM
(This post was last modified: Jul-10-2019, 07:27 AM by Gribouillis.)
There are also modules for this on pypi, such as this one. It could be worth trying.
Posts: 4,655
Threads: 1,498
Joined: Sep 2016
Jul-12-2019, 02:05 AM
(This post was last modified: Jul-12-2019, 02:06 AM by Skaperen.)
that looks like a good module to have around.
i ended up implementing the numeric sort, for sorting files, this way:
my script acts as a filter, reading stdin then printing the result to stdout.
this filter script defines a special code character as Unicode U+FFF6, very unlikely to be in any text data. if the input line has this code in it, a single split around it is done and the part after the code is printed to stdout. else (if the line does not have the code in it) it does re.split(r'(\d+)',line) . then it scans that list for any elements that are .isdecimal() . strings that are .isdecimal() are padded with enough leading zeros to make it a wide fixed width number. the list is then joined back to a single string. the modified string + the code + the original string are then printed to stdout. this filter will undo what it does when it processes its own result. so a Unix shell pipeline like filter|sort|filter performs the numeric sort i want. and it avoids storing the whole file in memory because sometimes i might have millions of lines to sort.
#!/usr/bin/env python3
from re import split
from sys import stdin
code = chr(0xfff6)
width = 25
regexp = r'(\d+)'
pad = '0'*width
for line in stdin:
line = line.split('\n',1)[0]
if not line:
print()
continue
if code in line:
line = line.split(code,1)[1]
print(line)
continue
seq = split(regexp,line)
for x in range(len(seq)):
if seq[x].isdecimal():
seq[x] = (pad+seq[x])[-width:]
seq.append(code)
seq.append(line)
print(''.join(seq))
exit(0) no, i didn't want to use line.rstrip(). i didn't want to lose any trailing spaces.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 4,807
Threads: 77
Joined: Jan 2018
Wouldn't sort --version-sort suit your needs?
|