Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
A more intelligent .sort()?
#2
I think the order of os.scandir, os.walk and os.listdir comes from inode-number of a file. But you're not the first one with this problem.

Python Code Glitch May Have Caused Errors In Over 100 Published Studies

The built-in function sorted and the method list.sort takes an argument for key.
The items are sorted by this key. If you just sort strings, then the lexicographical order is applied.
The numbers must be converted into integer.
The key is a function, which takes one element and return something (often an int).


Sorting just the strings:
['pdf_10_page1',
 'pdf_10_page2',
 'pdf_10_page3',
 'pdf_10_page4',
 'pdf_1_page1',
 'pdf_1_page2',
 'pdf_1_page3',
 'pdf_1_page4',
 'pdf_2_page1',
 'pdf_2_page2',
 'pdf_2_page3',
 'pdf_2_page4',
 'pdf_3_page1',
 'pdf_3_page2',
 'pdf_3_page3',
 'pdf_3_page4',
 'pdf_4_page1',
 'pdf_4_page2',
 'pdf_4_page3',
 'pdf_4_page4',
 'pdf_5_page1',
 'pdf_5_page2',
 'pdf_5_page3',
 'pdf_5_page4',
 'pdf_6_page1',
 'pdf_6_page2',
 'pdf_6_page3',
 'pdf_6_page4',
 'pdf_7_page1',
 'pdf_7_page2',
 'pdf_7_page3',
 'pdf_7_page4',
 'pdf_8_page1',
 'pdf_8_page2',
 'pdf_8_page3',
 'pdf_8_page4',
 'pdf_9_page1',
 'pdf_9_page2',
 'pdf_9_page3',
 'pdf_9_page4']
First you need to know the pattern of your files. Then you can apply regex, to get the numbers out of the string.
Example:

import re


def sort_pdfs(pdf):
    match = re.search(r"pdf_(\d+)_page(\d+)", pdf)
    if match:
        return tuple(map(int, match.groups()))
    else:
        return (0, 0)
        # if the pattern does not match




pdfs = ['pdf_10_page1',
 'pdf_10_page2',
 'pdf_10_page3',
 'pdf_10_page4',
 'pdf_1_page1',
 'pdf_1_page2',
 'pdf_1_page3',
 'pdf_1_page4',
 'pdf_2_page1',
 'pdf_2_page2',
 'pdf_2_page3',
 'pdf_2_page4',
 'pdf_3_page1',
 'pdf_3_page2',
 'pdf_3_page3',
 'pdf_3_page4',
 'pdf_4_page1',
 'pdf_4_page2',
 'pdf_4_page3',
 'pdf_4_page4',
 'pdf_5_page1',
 'pdf_5_page2',
 'pdf_5_page3',
 'pdf_5_page4',
 'pdf_6_page1',
 'pdf_6_page2',
 'pdf_6_page3',
 'pdf_6_page4',
 'pdf_7_page1',
 'pdf_7_page2',
 'pdf_7_page3',
 'pdf_7_page4',
 'pdf_8_page1',
 'pdf_8_page2',
 'pdf_8_page3',
 'pdf_8_page4',
 'pdf_9_page1',
 'pdf_9_page2',
 'pdf_9_page3',
 'pdf_9_page4'
]

pdfs.sort(key=sort_pdfs)
Output:
['pdf_1_page1', 'pdf_1_page2', 'pdf_1_page3', 'pdf_1_page4', 'pdf_2_page1', 'pdf_2_page2', 'pdf_2_page3', 'pdf_2_page4', 'pdf_3_page1', 'pdf_3_page2', 'pdf_3_page3', 'pdf_3_page4', 'pdf_4_page1', 'pdf_4_page2', 'pdf_4_page3', 'pdf_4_page4', 'pdf_5_page1', 'pdf_5_page2', 'pdf_5_page3', 'pdf_5_page4', 'pdf_6_page1', 'pdf_6_page2', 'pdf_6_page3', 'pdf_6_page4', 'pdf_7_page1', 'pdf_7_page2', 'pdf_7_page3', 'pdf_7_page4', 'pdf_8_page1', 'pdf_8_page2', 'pdf_8_page3', 'pdf_8_page4', 'pdf_9_page1', 'pdf_9_page2', 'pdf_9_page3', 'pdf_9_page4', 'pdf_10_page1', 'pdf_10_page2', 'pdf_10_page3', 'pdf_10_page4']
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Messages In This Thread
A more intelligent .sort()? - by Pedroski55 - Jul-07-2020, 09:38 AM
RE: A more intelligent .sort()? - by DeaD_EyE - Jul-07-2020, 11:11 AM
RE: A more intelligent .sort()? - by snippsat - Jul-07-2020, 05:47 PM
RE: A more intelligent .sort()? - by Pedroski55 - Jul-07-2020, 10:49 PM
RE: A more intelligent .sort()? - by DeaD_EyE - Jul-08-2020, 07:52 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
Photo a.sort() == b.sort() all the time 3lnyn0 1 1,417 Apr-19-2022, 06:50 PM
Last Post: Gribouillis
  some ideas for intelligent list splitting? wardancer84 4 3,310 Nov-20-2018, 02:47 PM
Last Post: DeaD_EyE

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020