Jul-07-2020, 11:11 AM
I think the order of
Python Code Glitch May Have Caused Errors In Over 100 Published Studies
The built-in function sorted and the method list.sort takes an argument for key.
The items are sorted by this key. If you just sort strings, then the lexicographical order is applied.
The numbers must be converted into integer.
The key is a function, which takes one element and return something (often an int).
Sorting just the strings:
Example:
os.scandir
, os.walk
and os.listdir
comes from inode-number of a file. But you're not the first one with this problem.Python Code Glitch May Have Caused Errors In Over 100 Published Studies
The built-in function sorted and the method list.sort takes an argument for key.
The items are sorted by this key. If you just sort strings, then the lexicographical order is applied.
The numbers must be converted into integer.
The key is a function, which takes one element and return something (often an int).
Sorting just the strings:
['pdf_10_page1', 'pdf_10_page2', 'pdf_10_page3', 'pdf_10_page4', 'pdf_1_page1', 'pdf_1_page2', 'pdf_1_page3', 'pdf_1_page4', 'pdf_2_page1', 'pdf_2_page2', 'pdf_2_page3', 'pdf_2_page4', 'pdf_3_page1', 'pdf_3_page2', 'pdf_3_page3', 'pdf_3_page4', 'pdf_4_page1', 'pdf_4_page2', 'pdf_4_page3', 'pdf_4_page4', 'pdf_5_page1', 'pdf_5_page2', 'pdf_5_page3', 'pdf_5_page4', 'pdf_6_page1', 'pdf_6_page2', 'pdf_6_page3', 'pdf_6_page4', 'pdf_7_page1', 'pdf_7_page2', 'pdf_7_page3', 'pdf_7_page4', 'pdf_8_page1', 'pdf_8_page2', 'pdf_8_page3', 'pdf_8_page4', 'pdf_9_page1', 'pdf_9_page2', 'pdf_9_page3', 'pdf_9_page4']First you need to know the pattern of your files. Then you can apply regex, to get the numbers out of the string.
Example:
import re def sort_pdfs(pdf): match = re.search(r"pdf_(\d+)_page(\d+)", pdf) if match: return tuple(map(int, match.groups())) else: return (0, 0) # if the pattern does not match pdfs = ['pdf_10_page1', 'pdf_10_page2', 'pdf_10_page3', 'pdf_10_page4', 'pdf_1_page1', 'pdf_1_page2', 'pdf_1_page3', 'pdf_1_page4', 'pdf_2_page1', 'pdf_2_page2', 'pdf_2_page3', 'pdf_2_page4', 'pdf_3_page1', 'pdf_3_page2', 'pdf_3_page3', 'pdf_3_page4', 'pdf_4_page1', 'pdf_4_page2', 'pdf_4_page3', 'pdf_4_page4', 'pdf_5_page1', 'pdf_5_page2', 'pdf_5_page3', 'pdf_5_page4', 'pdf_6_page1', 'pdf_6_page2', 'pdf_6_page3', 'pdf_6_page4', 'pdf_7_page1', 'pdf_7_page2', 'pdf_7_page3', 'pdf_7_page4', 'pdf_8_page1', 'pdf_8_page2', 'pdf_8_page3', 'pdf_8_page4', 'pdf_9_page1', 'pdf_9_page2', 'pdf_9_page3', 'pdf_9_page4' ] pdfs.sort(key=sort_pdfs)
Output:['pdf_1_page1',
'pdf_1_page2',
'pdf_1_page3',
'pdf_1_page4',
'pdf_2_page1',
'pdf_2_page2',
'pdf_2_page3',
'pdf_2_page4',
'pdf_3_page1',
'pdf_3_page2',
'pdf_3_page3',
'pdf_3_page4',
'pdf_4_page1',
'pdf_4_page2',
'pdf_4_page3',
'pdf_4_page4',
'pdf_5_page1',
'pdf_5_page2',
'pdf_5_page3',
'pdf_5_page4',
'pdf_6_page1',
'pdf_6_page2',
'pdf_6_page3',
'pdf_6_page4',
'pdf_7_page1',
'pdf_7_page2',
'pdf_7_page3',
'pdf_7_page4',
'pdf_8_page1',
'pdf_8_page2',
'pdf_8_page3',
'pdf_8_page4',
'pdf_9_page1',
'pdf_9_page2',
'pdf_9_page3',
'pdf_9_page4',
'pdf_10_page1',
'pdf_10_page2',
'pdf_10_page3',
'pdf_10_page4']
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
All humans together. We don't need politicians!