Python Forum
File loop curiously skipping files - FIXED
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
File loop curiously skipping files - FIXED
#1
I'll admit I'm a beginner in python. I'm trying to write a small program to list all the media on my NAS device and write to an easy to read html file. It seems to work BUT the writing to the html file stops at a certain point without an error. The program continues looping round and no error messages are generated. The html file is about 60K so it's not huge. I wondered if it was a bad filename but, if I write a large text string to the file at the start of the program, the output finishes at an earlier file though the program still continues looping round. I'm using IDLE - does that have a limit on the number of lines written? I assume it must be a bug in my code but I can't see anything (fames last words). Any suggestions of where to look?

import os
import re
import time

# init things
startDir = 'Z:/Films'
cnt = 0
totalCnt = 0
now = time.time()

print( "Running" )
f = open( "films.html", "w" )

# head
f.write( "<html><head>" )
str = """
<style>
td {
    background-color: 	#F0E68C;
    font-size: 22px;
}
th {
    background-color: 	#808080;
    font-size: 26px;
}
</style>
</head>
"""
f.write( str )

# body
f.write( "<body  style='background-color:#EEE8AA'><center>" )

# loop through directories
for root, dirs, filenames in os.walk( startDir ):    # recursively goes through all dir
   dir = root.split(os.path.sep)[-1]   # path, split, one back
   
   if dir == "Z:/Films": continue
   # if dir == "SciFi": continue
   
   # ignore the TV section
   if "TV" in root: continue   # should of moved above loop, multi line

   str = "<table border=1 width=800 size=+1><tr><Th colspan=2><b>" + dir + "</b></th></tr>\n"
   f.write( str )
   left = True
   cnt = 0
   
   # loop through each file
   for filename in filenames:   # filen

      # get film type
      type = re.sub('.*[.]', '', filename )

      # ignore if not video type
      if type not in [ "avi", "mp4", "mkv", "m4v" ]:
          continue
        
      try:
         col = "black"
         path = os.path.join(root, filename)
         size = os.stat(path).st_size
         file = filename[:-4]
        
         # highlight issues with files
         if os.stat(path).st_mtime >= now - 30 * 86400:
            col = "blue"
         elif re.search("CD[2-9]", file):
            continue
         elif re.search("CD[0-9]", file):
            col = "brown"
         elif re.search("[12][09][0-9][0-9]", file) and not re.search("[(][12][09][0-9][0-9][)]", file):
            col = "brown"
         elif size > 1024 * 1024 * 1424:
            col = "brown"
         elif re.search("[^A-Za-z0-9 )(]", file):
            col = "brown"

         # produce 2 columns in html table
         if left:
            str = "<tr><td><font color=" + col + ">" + file + "</font></td>\n"
            left = False
         else:
            str = "<td><font color=" + col + ">" + file + "</font></tr>\n"
            left = True

         f.write( str )
      except Exception as e:
         # say if any issues
         print("issue ",filename)
         print( e.args[0] )
         
      cnt += 1
    
   # say how many dealt with
   print( "  ",dir,cnt, "files" )
   totalCnt += cnt

   # finish table
   if left: f.write( "<td></td></tr>" )
   f.write( "</table><br><br>\n" )

# finish off
f.write( "</center></body></html>\n" )
str = f'There are {cnt} films'
f.write( str )
print( "Finished", totalCnt, "films" )
Reply
#2
Dunno about your html-ing (I like it!), but just to get the path and name of every file you want, before you stat the st_mtime:

import os

path2files = '/home/pedro/Videos/'
wanted = [ ".avi", ".mp4", ".mkv", ".m4v" ]
for (root,dirs,files) in os.walk(path2files,topdown=True):
    for f in files:
        for ending in wanted:
            if f.endswith(ending):
                print(f"Directory path: {root}")
                #print(f"Directory Names: {dirs}")
                print(f'filenames are {f}')
Sample of the output:

Output:
Directory path: /home/pedro/Videos/ filenames are The Moments of Happiness.mp4 Directory path: /home/pedro/Videos/ filenames are Happy Hippo Pat and Stan_mycut.m4v Directory path: /home/pedro/Videos/ filenames are Insight24 -- Information. Intelligence. Insight..mp4
But I think we can get that down to a 1-liner:

files = [glob.glob(f'/home/pedro/Videos/**/*{ending}', recursive=True) for ending in wanted]
The above produces 4 lists, 1 list for each ending.
Below produces 4 generators:

files = [glob.iglob(f'/home/pedro/Videos/**/*{ending}', recursive=True) for ending in wanted]
Reply
#3
Thank you for your response. I'll look into using your code but the code I have already loops through every file it needs to. If I print out the text at the same time I'm writing to the file then the printed text looks as it should but the file is truncated at 60K. I say the file is truncated but it doesn't stop mid line but at the end of a printed line. Am I correct using "\n" as a line terminator? I'm running on a Win 11 PC.

I suppose I could try printing the output and redirecting it to a html file to see if that works but obviously that doesn't explain why it stops writing to the file.
[EDIT]I found out windows uses \r\n as a line terminator. Full of hope I changed the code and ran again but sadly it still stops short :(
It actually stops a file or 2 earlier now as I guess the additional "\r" on each line takes up more space.
I should say I'm not running out of disk space!
It still happily prints every line as it should, it just stops writing them to the file.
Reply
#4
Sorry, I don't understand.

I believe you want to put the path and names of video files in an html table. Is that correct?

These names are just strings, not the actual files. So I am not sure where you are getting 60K from??

html is just a text file. So create the html table as a list, then ''.join(html_list).

You need something like these basic tags to make the html table list. In my case, the formatting is done in separate css files (which you can also generate using Python!)

tableStart = '<div class="div-table">\n<table>\n'
rowBegin = '<tr>'
if header[0] == 'False':
    tableHeader = '<td>X</td>'
elif header[0] == 'True':
    tableHeader = '<th>X</th>'
tableData = '<td>X</td>'
rowEnd = '</tr> \n'
tableEnd = '</table><br> \n  </div><br> \n \n \n'
numCols = int(numberColumns[0])  


glob.glob() above produces a list of 4 lists, 1 list for each type of video file you want.

If you loop through these lists, appending each list element to a list like dataTable = [], below, according to the number of columns you want, just replace X in '<td>X</td>' with each path or file in the lists. Add <tr> and </tr> to each row.

Finally, you join dataTable

newTable = []
newTable.append(tableStart)
for i in range(0, len(dataTable)):
    newTable.append(dataTable[i])
    newTable.append(tableEnd)
# join the list to a string, write the string to your html
newTableString = ' '.join(newTable) 
Write newTableString to your html file.

Most of the time I do this starting with a text file of words or phrases in English and Chinese, so my tables usually only have 2 columns.
Reply
#5
It turned out I needed a flush of the file handler at the end. All working well now.
Reply
#6
You need to close files when you are done writing. This happens automatically when a program ends, but programs don't really end when run in IDLE.
Gribouillis likes this post
Reply
#7
(Feb-09-2024, 09:58 AM)Pedroski55 Wrote: html is just a text file. So create the html table as a list, then ''.join(html_list).
Alternately you could write to a StringIO which is a file in memory. Thus you can have the same code for writing to a memory file or a file on disk.
« We can solve any problem by introducing an extra level of indirection »
Reply
#8
(Feb-09-2024, 09:58 AM)Pedroski55 Wrote: These names are just strings, not the actual files. So I am not sure where you are getting 60K from??
I'm going to guess I watch more media than you ;)
Reply
#9
(Feb-09-2024, 09:58 AM)Pedroski55 Wrote: These names are just strings, not the actual files. So I am not sure where you are getting 60K from??
I'm going to guess I watch more media than you ;)
Reply
#10
Happy New Year of the Dragon!

Just out of interest, I ran all the paths to videos through my little html table maker. First I need a text file called words. My makeTable() function then makes the html. I chose 1 column, because some of the path + file name strings are quite long.

Didn't realise I had so many videos! Never watch them!

import glob

path2files = '/home/pedro/Videos/'
path2words = '/home/pedro/myHWpageSummer2019/textTohtml/makehtmlTable/words'
wanted = [ ".avi", ".mp4", ".mkv", ".m4v" ]
files_dict = {ending:glob.glob(f'{path2files}**/*{ending}', recursive=True) for ending in wanted}
with open(path2words, 'w') as w:
    for key in files_dict.keys():
        w.write(key + '\n')
        for f in range(len(files_dict[key])):
            w.write(files_dict[key][f] + '\n')
The output of makeTable() looks like this:

Output:
# run makeHTML insert-table ends up like below # best is 1 column because the path + filename can be quite long with all the subfolders """ <div class="div-table"> <table> <tr> <th>All my films</th> </tr> <tr> <td><strong>.avi</strong></td> </tr> <tr> <td>/home/pedro/Videos/Videos/Blade_ Runner_2049.avi</td> </tr> <tr> <td>/home/pedro/Videos/Wag.The.Dog.1997.avi</td> </tr> <tr> <td>/home/pedro/Videos/Red_2.avi</td> </tr> <tr> <td><strong>.mp4</strong></td> </tr> <tr> <td>/home/pedro/Videos/The Moments of Happiness.mp4</td> </tr> <!-- many many more --> </table><br> </div><br> """
Of course, I don't really know what all your regexes are doing!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Loop through all files in a directory? Winfried 10 415 Apr-23-2024, 07:38 PM
Last Post: FortuneCoins
  How to do "fixed size" (wrapping) math in Python? AlexanderWulf 13 1,899 Jul-19-2023, 04:13 PM
Last Post: deanhystad
  Fixed colum width for rowLabels i Matplotlib pandabay 0 428 Jun-10-2023, 03:40 PM
Last Post: pandabay
  How to loop through all excel files and sheets in folder jadelola 1 4,524 Dec-01-2022, 06:12 PM
Last Post: deanhystad
  Skipping line in text without Restarting Loop IdMineThat 4 1,499 Apr-05-2022, 04:23 AM
Last Post: deanhystad
  Encrypt and decrypt in python using own fixed key SriRajesh 3 4,881 Feb-20-2022, 01:18 PM
Last Post: dboxall123
  python seems to be skipping lines of code alansandbucket 1 4,169 Jun-22-2021, 01:18 AM
Last Post: Larz60+
  Play fixed frequency sound in python 3 jpezz 2 2,794 Feb-07-2021, 08:21 PM
Last Post: jpezz
  Referencing a fixed cell Mark17 2 2,072 Dec-17-2020, 07:14 PM
Last Post: Mark17
  Using Python to loop csv files to open them Secret 4 2,750 Sep-13-2020, 11:30 AM
Last Post: Askic

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020