Dec-17-2024, 03:03 PM
issue: if you start a subprocess that has a pipe to stdout for the purpose of concurrently processing output from stdout using asyncio everything works normal except when that subprocess is a python script.
Expected: Start a subprocess of a python script concurrently receive stdout output one line at a time for the duration of the sub process.
Actual: The subprocess starts the python script, stdout output builds up (for about 637 lines) then the lines are received by the main process, then the subprocess continues to build up more lines of output.
In the sample python script. the function run will run three different scripts, in order, bash, python and nodejs. Bash and nodejs work as expected. 1 line of stdout output in the subprocess is 1 line returned through the pipe to be handled by the main process. The Python script should do the same. Instead, the output is blocked or queued or buffered in some way until either the python script finishes or there is around 637 lines in the stdout pipe. At which point all the lines in the pipe are then handled by the main process at the same time.
The main process writes to stream.log. As output is returned from the subprocess, each line is written to stream.log. Each subprocess will write to stdout and write to stream2.log. A subprocess writes to stdout so that the main process can receive the lines of output and write them to stream.log. Each subprocess has a delay 500ms, this is so you can easily see when the lines of stdout are being piped to the main process. First goes bash, then python, and finally nodejs. You will see, the bash output getting written to both logs at nearly the same time with the same -ish delay between writes. Next, you see the python script writing to stream2.log and nothing being written to stream.log. When the python script finishes, then you see stream.log fill with all the lines that were buffered or blocked. Last, nodejs starts and we are back to seeing stream.log and stream2.log being written to at the same frequency. Bash and nodejs work as expected.
Setup
Save the contents of the script below to a file my_script.py
terminal window 1
Expected: Start a subprocess of a python script concurrently receive stdout output one line at a time for the duration of the sub process.
Actual: The subprocess starts the python script, stdout output builds up (for about 637 lines) then the lines are received by the main process, then the subprocess continues to build up more lines of output.
In the sample python script. the function run will run three different scripts, in order, bash, python and nodejs. Bash and nodejs work as expected. 1 line of stdout output in the subprocess is 1 line returned through the pipe to be handled by the main process. The Python script should do the same. Instead, the output is blocked or queued or buffered in some way until either the python script finishes or there is around 637 lines in the stdout pipe. At which point all the lines in the pipe are then handled by the main process at the same time.
The main process writes to stream.log. As output is returned from the subprocess, each line is written to stream.log. Each subprocess will write to stdout and write to stream2.log. A subprocess writes to stdout so that the main process can receive the lines of output and write them to stream.log. Each subprocess has a delay 500ms, this is so you can easily see when the lines of stdout are being piped to the main process. First goes bash, then python, and finally nodejs. You will see, the bash output getting written to both logs at nearly the same time with the same -ish delay between writes. Next, you see the python script writing to stream2.log and nothing being written to stream.log. When the python script finishes, then you see stream.log fill with all the lines that were buffered or blocked. Last, nodejs starts and we are back to seeing stream.log and stream2.log being written to at the same frequency. Bash and nodejs work as expected.
Setup
Save the contents of the script below to a file my_script.py
touch stream.log touch stream2.logRunning
terminal window 1
tail -f stream.logterminal window 2
tail -f stream2.login a third terminal
python my_script.py
import asyncio import logging logger = logging.getLogger(__name__) logging.basicConfig(filename="stream.log", encoding="utf-8", level=logging.DEBUG) js_code = """ const fs = require('fs'); const logFile = fs.createWriteStream('stream2.log', { flags: 'a' }); // 'a' for append, 'w' for overwrite (async function() { const sleep = ms => new Promise(resolve => setTimeout(resolve, ms)) let i = 0; while( i < 10 ){ let msg = `counter: ${i}`; console.log(msg); logFile.write(`${msg}\n`); i += 1; await sleep(500) } console.log("JS completed") })() """ bash_code = 'for i in $(seq 1 10); do echo "Hi, $i" >> stream2.log; echo "Hi, $i"; sleep .5; done' py_code = """ import time import logging logger = logging.getLogger(__name__) logging.basicConfig(filename="stream2.log", encoding="utf-8", level=logging.DEBUG) i = 0 while i < 10: msg = f"counter: {i}" print(msg) logger.debug(msg) i += 1 time.sleep(0.5) print("completed") logger.debug("PY completed") """ async def run(command: list[str]): proc = await asyncio.create_subprocess_exec( *command, stdout=asyncio.subprocess.PIPE, ) while True: data = await proc.stdout.readline() if not data: logger.debug("end of stream") break line = data.decode().rstrip() logger.debug(line) await proc.wait() return proc.returncode asyncio.run(run(["bash", "-c", bash_code])) asyncio.run(run(["python", "-c", py_code])) asyncio.run(run(["node", "-e", js_code]))If anybody can get me steered in the right direction on this it would be much appreciated.