There is an insidious bug in subprocess.Popen() caused by io buffering of stdout (possibly stderr). There is a limit of around 65536 characters in the child process io buffer. If the child process writes enough output the child process will "hang" waiting for the buffer to be flushed - a deadlock situation. The authors of subprocess.py seem to believe this is a problem caused by the child, even though a subprocess.flush would be welcome. Pearson Anders pearson,
https://thraxil.org/users/anders/posts/2008/03/13/Subprocess-Hanging-PIPE-is-your-enemy/ Has a simple solution but you have to pay attention. As he says, "tempfile.TemporaryFile() is your friend." In my case I am running an application in a loop to batch process a bunch of files, the code for the solution is:
with tempfile.TemporaryFile() as fout:
sp.run(['gmat', '-m', '-ns', '-x', '-r', str(gmat_args)], \
timeout=cpto, check=True, stdout=fout, stderr=fout)
The fix above still deadlocks after processing about 20 files. An improvement but not good enough, since I need to process hundreds of files in a batch. I came up with the below "crowbar" approach.
proc = sp.Popen(['gmat', '-m', '-ns', '-x', '-r', str(gmat_args)], stdout=sp.PIPE, stderr=sp.STDOUT)
""" Run GMAT for each file in batch.
Arguments:
-m: Start GMAT with a minimized interface.
-ns: Start GMAT without the splash screen showing.
-x: Exit GMAT after running the specified script.
-r: Automatically run the specified script after loading.
Note: The buffer passed to Popen() defaults to io.DEFAULT_BUFFER_SIZE, usually 62526 bytes.
If this is exceeded, the child process hangs with write pending for the buffer to be read.
https://thraxil.org/users/anders/posts/2008/03/13/Subprocess-Hanging-PIPE-is-your-enemy/
"""
try:
(outs, errors) = proc.communicate(cpto)
"""Timeout in cpto seconds if process does not complete."""
except sp.TimeoutExpired as e:
logging.error('GMAT timed out in child process. Time allowed was %s secs, continuing', str(cpto))
logging.info("Process %s being terminated.", str(proc.pid))
proc.kill()
""" The child process is not killed by the system. """
(outs, errors) = proc.communicate()
""" And the stdout buffer must be flushed. """
The basic idea is to kill the process and flush the buffer on each timeout. I moved the TimeoutExpired exception into the batch processing loop so that after killing the process, it continues with the next. This is harmless if the timeout value is sufficient to allow gmat to complete (albeit slower). I find that the code will process from 3 to 20 files before it times out.
This really seems like a bug in subprocess.