Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
examples using os.walk()
#1
does anyone have or know of (url) example code that uses os.walk() and produces a list of all file system objects in the order you'd get by going through a sort program that treats the sep character ('/' or '\\') as lower than all other printable characters?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
google: "os.walk example python 3"
buran likes this post
Reply
#3
i tried the first example found, at https://www.tutorialspoint.com/python3/os_walk.htm and it crashes after 24152 lines of output.

Output:
Traceback (most recent call last): File "walk1.py", line 7, in <module> print(os.path.join(root, name)) UnicodeEncodeError: 'utf-8' codec can't encode character '\udce4' in position 88: surrogates not allowed
so i just took the first 24100 lines from running the example and sorted it. the input and output were very different. any idea which example shows the correct sorting?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#4
Advise sending bug report to python.org with complete code that crashed.
Reply
#5
you do understand that this is a case of UTF-16 like codes that UTF-8 is not supposed to handle? this is more a case of using the wrong encoding. maybe i could have tweaked the code but the sorting issue still exists in os.walk() and that's what i cared about.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#6
This example?

# !/usr/bin/python3
import os

os.chdir("d:\\tmp")
for root, dirs, files in os.walk(".", topdown = False):
   for name in files:
      print(os.path.join(root, name))
   for name in dirs:
      print(os.path.join(root, name))
I guess a filname or path on your filesystem is using invalid unicode.
The error happens in the function print. You can use a kind of hack to ship aroud this:

def encdec_hack(s):
    return s.encode("utf8", errors="ignore").decode()

for root, dirs, files in os.walk("."):
    for f in files:
        try:
            print("File [ OK  ]:", f)
        except UnicodeEncodeError:
            print()
            print("File [ ERR ]:", encdec_hack(f))
The best solution is: Fix your filesystem. Delete or rename the files with broken encoding.

It can also causes trouble with other Applications like an file explorer, backup tools etc...

BTW: The use of pathlib.Path has the same problem. The representation of the Path is ok
To print the representation of something:
print(repr(something))
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#7
Frankly, it's unclear what "sorting" issue you have, but am inclined to put my money on it not being a python bug.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#8
It's not a Python bug.

It's a bug of an Application which produces files or directories with illegal encoding.
Often it's just a file downloaded somewhere from the internet.

I had this issue often with files from my Chinese coworkers.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#9
the 24152 files encountered before the file with the name that was not valid Unicode (but is valid POSIX) were sorted wrong. since the problem happened in print() then clearly os.walk delivered that name, or one of the iterators that followed it did.

maybe i can use encode('latin1'). or i can use an encoder i implemented that does no do surrogates

IMHO, the ultimate fix is to Unicode. remove UTF-16 and the surrogates it requires. there is virtually no need for UTF-16 and no need for surrogates without UTF-16.

but that's not the issue i raise. i will try to come up with another way to show the issue, one that does not involve print(). or just run this on a file tree without these names (old music files with names in an ISO code). once you get the list of files. sort them using an order that collates the os.sep character lower than all others. you might want to just keep the list in memory an sort them, there.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#10
it's not a bug in Python or any implementation. it's a design goal issue. os.walk() was designed for speedy delivery of file names in all the directories, not for sorting. my file tree recursion generator was designed for sorting and does accomplish it correctly. below is an example of a script that produces directory names which get stored into a file named "0". then the sort command sorts it with output to a file named "1". finally the command "head 0 1" shows the first ten lines of each file. the source code is output by my script named "box".
Output:
lt2a/forums /home/forums 37> box oswalk.py +----<oswalk.py>------------------------------+ | import os,sys | | t = sys.argv[1] if len(sys.argv)>1 else '.' | | for d,ds,fs in os.walk(t): | | print(d) | +---------------------------------------------+ lt2a/forums /home/forums 38> py oswalk.py /home/forums >0 lt2a/forums /home/forums 39> sort <0 >1 lt2a/forums /home/forums 40> head 0 1 ==> 0 <== /home/forums /home/forums/requests /home/forums/requests/files.pythonhosted.org /home/forums/requests/files.pythonhosted.org/packages /home/forums/requests/files.pythonhosted.org/packages/01 /home/forums/requests/files.pythonhosted.org/packages/01/62 /home/forums/requests/files.pythonhosted.org/packages/01/62/ddcf76d1d19885e8579acb1b1df26a852b03472c0e46d2b959a714c90608 /home/forums/requests/src /home/forums/requests/src/requests-2.22.0 /home/forums/requests/src/requests-2.22.0/requests ==> 1 <== /home/forums /home/forums/.audacity-data /home/forums/.audacity-data/AutoSave /home/forums/.audacity-data/Plug-Ins /home/forums/.bash_history.d /home/forums/.cache /home/forums/.cache/fontconfig /home/forums/.cache/gstreamer-1.0 /home/forums/.cache/mesa_shader_cache /home/forums/.cache/mozilla lt2a/forums /home/forums 41>
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  features examples by release costa_shul 2 2,497 Sep-06-2020, 11:35 AM
Last Post: costa_shul
  list of compliances of all special methods - examples nzcan 2 2,764 Sep-01-2018, 08:33 PM
Last Post: Windspar
  why i don't like os.walk() Skaperen 20 19,637 Jan-11-2018, 08:39 AM
Last Post: Skaperen
  WSGI working examples Skaperen 1 3,480 May-29-2017, 10:45 AM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020