Why is PYTHONEXECUTABLE only supported on MAC OS X?

keith_hanlan · Nov-15-2021, 09:05 PM

I need a way to set argv[0] when using python's -m option.

The python manual documents the following environment variable:

Quote:PYTHONEXECUTABLE: If this environment variable is set, sys.argv[0] will be set to its value instead of the value got through the C runtime. Only works on Mac OS X.

This is exactly the functionality that I need but for Linux. Using sys.executable is not a solution since it needs to be referenced in each script module rather than in one common location.

Let me describe my use case and explain why I need this functionality:

Our tool is comprised of:

a primary application delivered with an embedded Python interpreter (and an associated library of python code)
a collection of python utilities which share some of the same python code
a standalone python distribution which matches the version embedded in the primary application.

The python utilities are shipped as symlinks to a common wrapper script in the bin directory. The wrapper is thus able to find the correct python distribution and run the tool itself which resides in a libexec directory. (This gives us a fair measure of platform independence as we completely control the two 32-bit and two 64-bit python distributions.)

For example, consider the following file structure:

tool/bin/.wrapper
tool/bin/tool-util-one@ -> .wrapper
tool/bin/tool-util-two@ -> .wrapper
tool/libexec/util_one.py
tool/libexec/util_two.py

The wrapper heuristically determines that tool-util-one has its code in util_one.py and so forth. Once it finds the correct python and module, it runs the utility using:

os.execve([python, '-m', 'util_one', sys.argv[1:]], env)

As one would expect, argv[0] becomes the python program itself. But then, when python processes the -m option, it changes argv[0] to "path/to/libexec/util_one.py" (as described in https://docs.python.org/3/using/cmdline.html):

Quote:If this option is given, the first element of sys.argv will be the full path to the module file

This is quite undesirable because it breaks the behaviour of code which wants to use sys.argv[0]. This includes, for example, argparse as well as our logging and metrics facilities.

A workaround is to modify every tool to use sys.executable instead of sys.argv but this is ugly and prone to error. It would be much simpler and safer to be able to make a single change in the wrapper. The PYTHONEXECUTABLE environment variable looks to fit the bill perfectly but does not work on Linux.

I have two questions:

Why is this only supported on Mac OS X?
Is there another solution that I can use?

Thank you very much,
Keith

**Gribouillis** · (This post was last modified: Nov-15-2021, 10:25 PM by Gribouillis.)

keith_hanlan Wrote:A workaround is to modify every tool to use sys.executable instead of sys.argv

This is the part that I don't understand in your description of the problem. sys.executable is the path to the python interpreter, while sys.argv[0] is normally the path to the python script that is being executed. These are completely different things. Which value do you want for sys.argv[0] instead of the path to the file defining the python module?

keith_hanlan · Nov-16-2021, 12:15 AM

(Nov-15-2021, 10:24 PM)Gribouillis Wrote:
keith_hanlan Wrote:A workaround is to modify every tool to use sys.executable instead of sys.argv
This is the part that I don't understand in your description of the problem. sys.executable is the path to the python interpreter, while sys.argv[0] is normally the path to the python script that is being executed. These are completely different things. Which value do you want for sys.argv[0] instead of the path to the file defining the python module?

Hi Gribouillis, sys.executable is whatever I call the python interpreter. This is why, for example, os.execve() takes both the executable and an argument list that includes argv[0].

Here's an demonstration. I have put a breakpoint in both the wrapper and the module:

# bin/tool-util-one blah.gz
> /path/to/bin/tool-util-one(116)main()
-> os.execve(python_path, argv, env)
(Pdb) p python_path
'/proj/releases/linux/tool/latest/python/bin/python'
(Pdb) p argv
['bin/tool-util-one', '-m', 'blurble.libexec.util_one', 'blah.gz']
(Pdb) c
> /path/to/blurble/libexec/util_one.py(24)<module>()
-> place_to_break()
(Pdb) p sys.argv[0]
'/path/to/blurble/libexec/util_one.py'
(Pdb) p sys.executable
'/path/to/bin/tool-util-one'
(Pdb)

On line 4-5, I show the path to the tool's python interpreter that the wrapper is going to call. And line 6-7 shows the argument list. Here, sys.argv[0] is what the user ran. This is the value I want to see inherited all the way through to the script module.

Instead, on line 11-12, we see the python module path - exactly as documented for python's -m option.

And finally, on line 13-14, sys.executable provides the original value of sys.argv[0] that I want. The value is available, but in order for argparse and other tools to access it, I am forced to change every tool.

I would like away to effect this change from the calling wrapper instead. It looks like Mac's get this functionality but not Linux.

Hence my two questions.

I hope this clears things up for you. Thanks for your question.

Best regards,
Keith

**Gribouillis** · (This post was last modified: Nov-16-2021, 09:16 AM by Gribouillis.)

I'm unable to answer the title of this thread, that is a question for the Python core dev team, but you could try the following workaround:

instead of execve-ing python with the arguments that you showed above, you would execve python wrapper2.py <any args you want>. Then wrapper2.py would do something like

# wrapper2.py
import importlib.util
import sys
modname = ... # should extract 'blurble.libexec.util_one' from sys.argv
spec = importlib.util.find_spec(modname)
assert(spec.origin.endswith('.py'))  # should be '/path/to/blurble/libexec/util_one.py'
with open(spec.origin) as ifh:
    code = compile(ifh.read(), spec.origin, 'exec')
sys.argv[:] = ... # do anything you want with sys.argv
mainmod = sys.modules['__main__']
mainmod.__file__ = spec.origin
exec(code, mainmod.__dict__)  # run the desired program in the __main__ module.

Of course, you would hide all this in a function's body (or an external module) to avoid cluttering the __main__ module's namespace.

keith_hanlan · Nov-17-2021, 04:10 PM

(Nov-16-2021, 09:04 AM)Gribouillis Wrote: I'm unable to answer the title of this thread, that is a question for the Python core dev team, but you could try the following workaround:
...

Thank you very much Gribouillis! I will experiment with that approach as soon as possible.

Why is PYTHONEXECUTABLE only supported on MAC OS X?

User Panel Messages

Announcements