Posts: 156
Threads: 46
Joined: Nov 2021
sorry for my bad english,
i have this problem :
thelist = ["t9","t8","t11","t10"]
print(thelist)
thelist.sort()
print(thelist) Output: ['t9', 't8', 't10', 't11']
['t10', 't11', 't8', 't9']
expected output is :
Output: ['t8', 't9', 't10', 't11']
i google for [python string "logical" "sorting"] but found nothing
please help me!!!
Posts: 6,551
Threads: 19
Joined: Feb 2020
Aug-08-2024, 02:45 AM
(This post was last modified: Aug-08-2024, 02:45 AM by deanhystad.)
Natural sorting. When sorting strings, t11 logically comes before t8, but it doesn’t look natural
Posts: 1,016
Threads: 141
Joined: Jul 2017
Try this:
thelist = ["t9","t8","t11","t10"]
# not quite good
sorted_list = sorted(thelist, key=lambda x: len(x))
# ok
sorted_list = sorted(thelist, key=lambda x: int(x[1:]))
Posts: 156
Threads: 46
Joined: Nov 2021
thank you for both answers,
the real problem is the above is a simplified example,
the real case is used for list files in folders that may contain hundreds or thousands of files
Posts: 2,066
Threads: 9
Joined: May 2017
Fix for your example.
def by_number(text):
return int(text[1:])
thelist = ["t9","t8","t11","t10"]
print(thelist)
# using the key function by_number, which returns int
# the int is then used for comparison
thelist.sort(key=by_number)
print(thelist) If you work with filenames, it's similar. Then you have to know the structure of the name, decompose it until only the number is left and then converting it to an int. If your filenames have a ISO8601 prefix, then you can parse it with datetime. datetime objects are sortable.
Example with ISO8601 prefix and pathlib:
from datetime import date as Date
from pathlib import Path
def by_date(path):
date_str, _ = path.name.split("_", maxsplit=1)
return Date.fromisoformat(date_str)
def walk(root, pattern):
for path in Path(root).glob(pattern):
yield path
def main():
root = "."
pattern = "????-??-??_*.*"
# or more explicit glob pattern: pattern = "[0-2][0-0][0-9][0-9]-[0-1][0-9]-[0-3][0-9]_*.*"
# 2022-10-12_bla1.txt matches the glob pattern
for path in sorted(walk(root, pattern), key=by_date):
print(path)
if __name__ == "__main__":
main()
kucingkembar likes this post
Posts: 1,016
Threads: 141
Joined: Jul 2017
@ kucingkembar
You may paste an example list with file names if you wish!
Posts: 156
Threads: 46
Joined: Nov 2021
this project is translating raw manga/manhua/comics from non-alphabet words to English,
the data obtained using selenium to [save page as - web page complete],
so the image names may be random,
but if you read them using image-viewer, they have right order,
like these :
-001.jpg, 002.jpg
-1.jpg, 10.jpg
-6183416182372182_86981783_1.webp ,6183416182372182_86981783_2.webp
-1722013291_90649092356080.jpg, 1722013293_31683094936372.jpg
Posts: 4,678
Threads: 73
Joined: Jan 2018
Aug-09-2024, 06:49 AM
(This post was last modified: Aug-09-2024, 06:49 AM by Gribouillis.)
I would use lists as sorting keys, where the integers have been converted to Python's int type
examples = [
["002.jpg", "001.jpg"],
["10.jpg", "1.jpg"],
["6183416182372182_86981783_2.webp", "6183416182372182_86981783_1.webp"],
["1722013293_31683094936372.jpg", "1722013291_90649092356080.jpg"],
]
import re
def numed(s):
L = re.split(r"(\d+)", s)
for i in range(1, len(L), 2):
L[i] = int(L[i])
return L
print("Sorting keys:")
for names in examples:
print([numed(x) for x in names])
print("Sorted names")
for names in examples:
print("unsorted:", names)
print("sorted :", sorted(names, key=numed)) Output: Sorting keys:
[['', 2, '.jpg'], ['', 1, '.jpg']]
[['', 10, '.jpg'], ['', 1, '.jpg']]
[['', 6183416182372182, '_', 86981783, '_', 2, '.webp'], ['', 6183416182372182, '_', 86981783, '_', 1, '.webp']]
[['', 1722013293, '_', 31683094936372, '.jpg'], ['', 1722013291, '_', 90649092356080, '.jpg']]
Sorted names
unsorted: ['002.jpg', '001.jpg']
sorted : ['001.jpg', '002.jpg']
unsorted: ['10.jpg', '1.jpg']
sorted : ['1.jpg', '10.jpg']
unsorted: ['6183416182372182_86981783_2.webp', '6183416182372182_86981783_1.webp']
sorted : ['6183416182372182_86981783_1.webp', '6183416182372182_86981783_2.webp']
unsorted: ['1722013293_31683094936372.jpg', '1722013291_90649092356080.jpg']
sorted : ['1722013291_90649092356080.jpg', '1722013293_31683094936372.jpg']
There are also solutions in Pypi such as natsort, but I don't know these modules. Install with care.
kucingkembar likes this post
« We can solve any problem by introducing an extra level of indirection »
Posts: 156
Threads: 46
Joined: Nov 2021
@ Gribouillis
your code is working flawlessly,
but i don't understand this part:
key=numed how the sort only pick [int] data only, and ignore the [str]
Posts: 4,678
Threads: 73
Joined: Jan 2018
Aug-09-2024, 10:04 AM
(This post was last modified: Aug-09-2024, 10:04 AM by Gribouillis.)
(Aug-09-2024, 09:03 AM)kucingkembar Wrote: how the sort only pick [int] data only, and ignore the [str] The numed function takes a string and returns a list where the (non negative) integers have been converted as python int
for example
print(numed('foo10bar035spam')) # -> prints ['foo', 10, 'bar', 35, 'spam']
print(numed('foo2bar035spam')) # -> ['foo', 2, 'bar', 35, 'spam'] The sorted function compares these lists instead of comparing the original strings, which produces the correct result. Lists are compared with lexicographic order
>>> ['foo', 2, 'bar', 35, 'spam'] < ['foo', 10, 'bar', 35, 'spam']
True Strings are not ignored, but 'foo' and 'foo' compare equal.
« We can solve any problem by introducing an extra level of indirection »
|