that looks like a good module to have around.
i ended up implementing the numeric sort, for sorting files, this way:
my script acts as a filter, reading stdin then printing the result to stdout.
this filter script defines a special code character as Unicode U+FFF6, very unlikely to be in any text data. if the input line has this code in it, a single split around it is done and the part after the code is printed to stdout. else (if the line does not have the code in it) it does
i ended up implementing the numeric sort, for sorting files, this way:
my script acts as a filter, reading stdin then printing the result to stdout.
this filter script defines a special code character as Unicode U+FFF6, very unlikely to be in any text data. if the input line has this code in it, a single split around it is done and the part after the code is printed to stdout. else (if the line does not have the code in it) it does
re.split(r'(\d+)',line)
. then it scans that list for any elements that are .isdecimal()
. strings that are .isdecimal()
are padded with enough leading zeros to make it a wide fixed width number. the list is then joined back to a single string. the modified string + the code + the original string are then printed to stdout. this filter will undo what it does when it processes its own result. so a Unix shell pipeline like filter|sort|filter performs the numeric sort i want. and it avoids storing the whole file in memory because sometimes i might have millions of lines to sort.#!/usr/bin/env python3 from re import split from sys import stdin code = chr(0xfff6) width = 25 regexp = r'(\d+)' pad = '0'*width for line in stdin: line = line.split('\n',1)[0] if not line: print() continue if code in line: line = line.split(code,1)[1] print(line) continue seq = split(regexp,line) for x in range(len(seq)): if seq[x].isdecimal(): seq[x] = (pad+seq[x])[-width:] seq.append(code) seq.append(line) print(''.join(seq)) exit(0)no, i didn't want to use line.rstrip(). i didn't want to lose any trailing spaces.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.