Oct-13-2018, 02:15 AM
i have a big file i need to sort. it has 2 or more whitespace separated tokens on each line. the last token has 2 or more slash separated names. i need to sort the lines in the order of the last name of the last token as the primary key and all the tokens before the last one as the secondary key with all the whitespace between them compared as if it is a single space. it looks like they are a single space but i can't be so sure because the file has about 88 million lines in 9GB. the system has 16GB RAM and 16GB swap space. i can reboot before running this sort. the sort command does not appear to have the ability to do this so i am thinking of doing this in Python. what i envision doing first is read in all lines of the file into a giant list. a sort key function would do all that funny parsing and comparison, optimized to skip parsing for the secondary keys if the primary keys are not equal. also, i need to do the comparison in a case insensitive way, but that shoulb easy enough. finally, the sorted list would be written out. does this sound fun? do i need another bottle of whiskey?
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.