Dec-27-2019, 04:13 PM
I am using multiprocessing to do some link processing.
The code I am using to create the processes is the same as shown in one of my earlier posts:
The problem is I get this error:
That didn't.
I do open a file and write to it many time, so I used the same closing function to see if that would help:
Again, that didn't work.
That means it is something to do with the amount of processes being opened. There are a few answers online, but non worked. The closest one to my exact problem said add
However, the error occurs before it even makes it to the for loop. And since it's an OSError, the isn't really much (any) useful information in the stack trace.
What is the problem here?
The code I am using to create the processes is the same as shown in one of my earlier posts:
1 2 3 4 5 6 7 8 9 10 |
while ( len (processes) < len ( list (urls))): #checks for current processes alive being less than all the links needing processing if ( len (processes) - len ([p for p in processes if not p.is_alive()]) < OPTIONS[ 'max_proc' ]): p = Process(target = Links.process_link, args = (urls[index], OPTIONS)) #create a new process processes.append(p) #add it to array p.start() index + = 1 for p in processes: p.join() #dont continue main script until processes have finished |
Error:OSError: [Errno 24] Too many open files
The first thing I tried was, using contextlib's closing function, closing the requests I make to web page like this:1 |
with closing(urlopen(Request(url, headers = { 'User-Agent' : 'Mozilla/3.0' }), context = CONTEXT)) as response: |
I do open a file and write to it many time, so I used the same closing function to see if that would help:
1 |
with closing(gzip.GzipFile(DIR_PATH + '/links.data.gz' , 'a' )) as lnk: #close file after we have finished with it |
That means it is something to do with the amount of processes being opened. There are a few answers online, but non worked. The closest one to my exact problem said add
p.terminate()
under p.join()
, so it would close the process after it's done.However, the error occurs before it even makes it to the for loop. And since it's an OSError, the isn't really much (any) useful information in the stack trace.
What is the problem here?