Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
too many files open
#1
Hi,
I was given a large number of prayer cards, already scanned, ready for OCR, as usual.
Only this time the formats are different, imagine single cards, double (folded) cards and folded cards (recto & verso).
You cannot OCR them like that, you need to crop them (= cut them up) into 2 or 4 pieces, save those crops, and OCR those.
Works well,on a limited number, but with large numbers, i get:
Output:
Traceback (most recent call last): File "C:\VVF\BP\BP-OCR-V5.py", line 236, in <module> do_crop() File "C:\VVF\BP\BP-OCR-V5.py", line 227, in do_crop crop.save(dst) File "C:\Users\Paul\AppData\Local\Programs\Python\Python310\lib\site-packages\PIL\Image.py", line 2300, in save save_handler(self, fp, filename) File "C:\Users\Paul\AppData\Local\Programs\Python\Python310\lib\site-packages\PIL\TiffImagePlugin.py", line 1731, in _save _fp = os.dup(fp.fileno()) OSError: [Errno 24] Too many open files
I've tried things unsuccessfully, the basic question is: I have an image, I crop it (in memory) into x parts, I save these parts to disk.
Done, next image. Somehow it keeps file pointers open.
This is what the function looks like:
for BP in glob.glob(mappath + '\*.*'):        
            img = Image.open(BP)
            w, h = img.size
            if w < 750:
                crops = crop_image(img)
                    for idx,crop in enumerate(crops):
                        
                        dst = os.path.join(os.curdir,'data','scans-cropped',fil)  # remark: filename is modified to make them all different
                        crop.save(dst)
                        crop.close()
How can I save a few 100.000 like this ?
thx,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#2
It would seem that no KISS answer is available.
Many posts with the same problem. The answers , if any, are always somewhat cryptic.
Implementing and understanding them would take much, much longer than plan B.
Because the error only comes up after 4024 items, just group the
quantity at hand in batches of 4.000.
Done. Cool
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#3
you should be able to rework your logic so files are opened, processed and closed
Currently, you're trying to open a huge slug at a time.
It's also better to use 'with' e.g. with open(filename) as fp: ...
Reply
#4
(Jul-17-2023, 03:04 PM)Larz60+ Wrote: Currently, you're trying to open a huge slug at a time.
Hi Larz,
I already tried with open(...) as ... but it gave no results.
Also I closed all files, or set them to "None"... everything failed.

Although the numbers are huge, in principle I open only 1 file at the time, cut it in max 4 parts,
save them, and close everything. But it would seem that the img.save operation keeps its filepointers somehow.

While executing my "Plan B", I have come to suspect a difference between win 10 and win 11.
My Plan B works on win 11 but not on win 10, with the same error24.

If by chance I find a reason why, I will publish it.
Is it Python, is it windows ? Am I barking up the wrong tree?
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#5
In code provided there is nowhere to be seen closing img.

File Handling in Pillow could be useful resource to check out.

I second Larz60+ suggestion that instead img = Image.open(BP) with contxt manager should be used or alternatively img.close() after your finished you manipulations of opened file.

Something along those lines (I don't use Pillow so I haven't tested it):

with Image.open(BP) as current_image:
    # do your stuff

# alternatively

img = Image.open(BP)
# do your stuff
img.close()
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#6
(Jul-18-2023, 05:53 AM)perfringo Wrote: In code provided there is nowhere to be seen closing img.
Yes, the code provided is one of maybe 10 alternatives I tried. Just to give you an idea.
Open an image, cut it into pieces, save pieces, close everyting. So Simple.

Yes I did "with ...as ..". , yes I closed everything I could close, ... to no avail.

I did some limited testing , along the lines of win 11 with python version 3.11.2 does work with batches of 4000,
and win 10 with python version 3.11.2 getting error 24 with batches of 4000. ( = my laptop).
I upgraded to python 3.11.4 on the win 10 laptop, and lo and behold, it does 4000 as we are speaking.
About 2 minutes per 4000.
I would prefer doing batches of 50.0000 of course, but I lost so much time on this already, I'll test it later.
I was rather hoping that somebody would know how to close all filepointers, in a program, taking care of error 24.
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  The Open Open Open Source Project GareBearH 1 2,688 May-14-2021, 10:48 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020