Python Forum

Full Version: Message Passing library for HPC and progressive gathering
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi there,

I'm writing a project that will run on an High Performance Computing (HPC) cluster. The general architecture of the project is pretty basic:
  1. Generate the arguments for each process.
  2. Run multiple parallel processes, each one with one set of the previously generated arguments.
  3. Collect the results of each process and store it in a file.

This looks like a perfect job for MPI. I've tried mpi4py, and it works pretty well. However, I want the last step (collect the results) to happen gradually, and as soon as a process is complete. I basically want a live-update of the results file. The function MPI.gather is therefore not suitable, as it blocks until all results are available. Playing with MPI.recv could work (I did not try, but I guess it works), but I then use 1 MPI slot and therefore 1 core just to wait for an I/O event. I could also modify the file inside the computing process, but I risk having thousands of processes trying to edit the same file at the same moment. Locking the file is fine, but the filesystem will cry.

I cannot use the built-in subprocess library either because the whole work has to be done on several nodes (opens ~2000 processes).

I've looked at the socket library, but it is low-level, and I can't imagine someone did not develop a proper package for this specific situation.

Do you have any suggestion on an adequate library for simple message passing? Or a trick to allow partial results for MPI.gather?

Thanks in advance for your help,
Duna
bump!