Python Forum

Full Version: waiting for the first of many pipes to send data
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
i am writing a script the will run up to about 16 child processes that send back output for the parent to read and print. the timing can be rather random because each is contacting different servers on the net. they will all run in parallel to reduce the total time. some servers have been known to take several minutes because of complex searches on huge databases. once a child process starts getting data and printing it over the pipe to the parent, it goes quite fast plus there is a requirement to not allow output lines to be mixed although the order does not matter. so one a child begins to send data the parent reads all from that child only until EOF on that pipe. but until it knows which child will output next, it needs to wait until one of the pipes is ready. in C i would call poll() to do this then loop around read() until EOF. in Python my first thought is to do it basically the same way. but i would like to know if there is a better alternative for that.
I would use the select or selectors module for this.
it looks like selectors is not much different than the kernel syscall stuff. i'll be experimenting with this over the next few days. got a lot of subprocesses to make.
the example in the docs for selectors is confusing to me. i'll just have a bunch of (list of) pipes already open for read, with the other end being a child process stdout. all i need is to wait for one of them to be ready and know which is. i already know how to do that with direct syscalls (select() or poll()) using file descriptors. my code will then loop and read that one pipe until EOF (not interleaving with any others) then wait to see which (less the one i just got EOF on) is ready the next time.

the example seems to be trying to run functions via selectors which i see as complicating the level of simplicity i need.
Use the selectors module: it is recent and high-level. You don't need to use callbacks as they do in the documentation's example. You can very well read directly from the pipes in the while True loop.
i don't want to read from any pipes until i know which one is ready, first. then i want to read only that one pipe, blocking in each read, until EOF on it. then back to checking for the next pipe to be ready. i'll probably need to change the pipe to non-blocking mode for the wait and to blocking mode for the read-to-EOF loop, and unregister that pipe and close it.

i wonder if i need to mess with setting blocking vs. non-blocking if i use a file object instead of a file descriptor. i'll be starting these subprocesses with Popen, so i can easily use the file object.

once a subprocess starts to get data from the net, it will be getting it all reasonably quickly with only small times between each line (they can be made to flush per line).
You can simply do this with the selectors module
import selectors

with selectors.DefaultSelector() as sel:
    npipes = 0
    for p in my_pipes:
        sel.register(p, selectors.EVENT_READ)
        npipes += 1

    while npipes:
        events = sel.select()
        for key, mask in events:
            sel.unregister(key.fileobj)
            npipes -= 1
            data = key.fileobj.read()
            key.fileobj.close()
            # do something with data
i'll try that. i'll probably make a function to do the wait thing to get just one result at a time. when there is more than one event (more than one pipe is ready) the order doesn't really matter. the parallelism is just for parallel performance, not to reveal which source is fastest. a previous version of this did the parallel stuff but the pipes were read in the order they were in the original list (so the time to see first output would usually be longer).