diff --git a/pep-0554.rst b/pep-0554.rst index f6f7992cb11..34ba6a3f4c1 100644 --- a/pep-0554.rst +++ b/pep-0554.rst @@ -6,7 +6,7 @@ Type: Standards Track Content-Type: text/x-rst Created: 2017-09-05 Python-Version: 3.7 -Post-History: +Post-History: 07-Sep-2017, 08-Sep-2017, 13-Sep-2017 Abstract @@ -29,9 +29,8 @@ Proposal The ``interpreters`` module will be added to the stdlib. It will provide a high-level interface to subinterpreters and wrap the low-level -``_interpreters`` module. The proposed API is inspired by the -``threading`` module. See the `Examples`_ section for concrete usage -and use cases. +``_interpreters`` module. See the `Examples`_ section for concrete +usage and use cases. API for interpreters -------------------- @@ -79,9 +78,10 @@ The module also provides the following class: Run the provided Python source code in the interpreter. Any keyword arguments are added to the interpreter's execution - namespace. If any of the values are not supported for sharing - between interpreters then RuntimeError gets raised. Currently - only channels (see "create_channel()" below) are supported. + namespace (the interpreter's "__main__" module). If any of the + values are not supported for sharing between interpreters then + ValueError gets raised. Currently only channels (see + "create_channel()" below) are supported. This may not be called on an already running interpreter. Doing so results in a RuntimeError. @@ -161,43 +161,45 @@ channels have no buffer. interpreters: - The list of associated interpreters (those that have called - the "recv()" method). - - __next__(): - - Return the next object from the channel. If none have been sent - then wait until the next send. + The list of associated interpreters: those that have called + the "recv()" or "__next__()" methods and haven't called "close()". recv(): Return the next object from the channel. If none have been sent - then wait until the next send. If the channel has been closed - then EOFError is raised. + then wait until the next send. This associates the current + interpreter with the channel. + + If the channel is already closed (see the close() method) + then raise EOFError. If the channel isn't closed, but the current + interpreter already called the "close()" method (which drops its + association with the channel) then raise ValueError. recv_nowait(default=None): Return the next object from the channel. If none have been sent - then return the default. If the channel has been closed - then EOFError is raised. + then return the default. Otherwise, this is the same as the + "recv()" method. close(): No longer associate the current interpreter with the channel (on - the receiving end). This is a noop if the interpreter isn't - already associated. Once an interpreter is no longer associated - with the channel, subsequent (or current) send() and recv() calls - from that interpreter will raise EOFError. + the receiving end) and block future association (via the "recv()" + method. If the interpreter was never associated with the channel + then still block future association. Once an interpreter is no + longer associated with the channel, subsequent (or current) send() + and recv() calls from that interpreter will raise ValueError + (or EOFError if the channel is actually marked as closed). - Once number of associated interpreters on both ends drops to 0, - the channel is actually marked as closed. The Python runtime - will garbage collect all closed channels. Note that "close()" is - automatically called when it is no longer used in the current - interpreter. + Once the number of associated interpreters on both ends drops + to 0, the channel is actually marked as closed. The Python + runtime will garbage collect all closed channels, though it may + not be immediately. Note that "close()" is automatically called + in behalf of the current interpreter when the channel is no longer + used (i.e. has no references) in that interpreter. - This operation is idempotent. Return True if the current - interpreter was still associated with the receiving end of the - channel and False otherwise. + This operation is idempotent. Return True if "close()" has not + been called before by the current interpreter. ``SendChannel(id)``:: @@ -217,36 +219,26 @@ channels have no buffer. send(obj): - Send the object to the receiving end of the channel. Wait until - the object is received. If the channel does not support the - object then TypeError is raised. Currently only bytes are - supported. If the channel has been closed then EOFError is - raised. + Send the object to the receiving end of the channel. Wait until + the object is received. If the channel does not support the + object then ValueError is raised. Currently only bytes are + supported. + + If the channel is already closed (see the close() method) + then raise EOFError. If the channel isn't closed, but the current + interpreter already called the "close()" method (which drops its + association with the channel) then raise ValueError. send_nowait(obj): Send the object to the receiving end of the channel. If the - object is received then return True. Otherwise return False. - If the channel does not support the object then TypeError is - raised. If the channel has been closed then EOFError is raised. + object is received then return True. If not then return False. + Otherwise, this is the same as the "send()" method. close(): - No longer associate the current interpreter with the channel (on - the sending end). This is a noop if the interpreter isn't already - associated. Once an interpreter is no longer associated with the - channel, subsequent (or current) send() and recv() calls from that - interpreter will raise EOFError. - - Once number of associated interpreters on both ends drops to 0, - the channel is actually marked as closed. The Python runtime - will garbage collect all closed channels. Note that "close()" is - automatically called when it is no longer used in the current - interpreter. - - This operation is idempotent. Return True if the current - interpreter was still associated with the sending end of the - channel and False otherwise. + This is the same as "RecvChannel.close(), but applied to the + sending end of the channel. Examples @@ -281,15 +273,15 @@ Pre-populate an interpreter :: interp = interpreters.create() - interp.run("""if True: + interp.run(tw.dedent(""" import some_lib import an_expensive_module some_lib.set_up() - """) + """)) wait_for_request() - interp.run("""if True: + interp.run(tw.dedent(""" some_lib.handle_request() - """) + """)) Handling an exception --------------------- @@ -298,9 +290,9 @@ Handling an exception interp = interpreters.create() try: - interp.run("""if True: + interp.run(tw.dedent(""" raise KeyError - """) + """)) except KeyError: print("got the error from the subinterpreter") @@ -312,12 +304,12 @@ Synchronize using a channel interp = interpreters.create() r, s = interpreters.create_channel() def run(): - interp.run("""if True: + interp.run(tw.dedent(""" reader.recv() print("during") reader.close() - """, - reader=r) + """), + reader=r)) t = threading.Thread(target=run) print('before') t.start() @@ -334,13 +326,13 @@ Sharing a file descriptor r1, s1 = interpreters.create_channel() r2, s2 = interpreters.create_channel() def run(): - interp.run("""if True: + interp.run(tw.dedent(""" fd = int.from_bytes( reader.recv(), 'big') for line in os.fdopen(fd): print(line) writer.send(b'') - """, + """), reader=r1, writer=s2) t = threading.Thread(target=run) t.start() @@ -356,19 +348,19 @@ Passing objects via pickle interp = interpreters.create() r, s = interpreters.create_channel() - interp.run("""if True: + interp.run(tw.dedent(""" import pickle - """, + """), reader=r) def run(): - interp.run("""if True: + interp.run(tw.dedent(""" data = reader.recv() while data: obj = pickle.loads(data) do_something(obj) data = reader.recv() reader.close() - """, + """), reader=r) t = threading.Thread(target=run) t.start() @@ -386,6 +378,27 @@ isolation within the same process. This can be leveraged in number of ways. Furthermore, subinterpreters provide a well-defined framework in which such isolation may extended. +Nick Coghlan explained some of the benefits through a comparison with +multi-processing [benefits]_:: + + [I] expect that communicating between subinterpreters is going + to end up looking an awful lot like communicating between + subprocesses via shared memory. + + The trade-off between the two models will then be that one still + just looks like a single process from the point of view of the + outside world, and hence doesn't place any extra demands on the + underlying OS beyond those required to run CPython with a single + interpreter, while the other gives much stricter isolation + (including isolating C globals in extension modules), but also + demands much more from the OS when it comes to its IPC + capabilities. + + The security risk profiles of the two approaches will also be quite + different, since using subinterpreters won't require deliberately + poking holes in the process isolation that operating systems give + you by default. + CPython has supported subinterpreters, with increasing levels of support, since version 1.5. While the feature has the potential to be a powerful tool, subinterpreters have suffered from neglect @@ -442,7 +455,8 @@ Consequently, projects that publish extension modules may face an increased maintenance burden as their users start using subinterpreters, where their modules may break. This situation is limited to modules that use C globals (or use libraries that use C globals) to store -internal state. +internal state. For numpy, the reported-bug rate is one every 6 +months. [bug-rate]_ Ultimately this comes down to a question of how often it will be a problem in practice: how many projects would be affected, how often @@ -545,11 +559,12 @@ Existing Usage -------------- Subinterpreters are not a widely used feature. In fact, the only -documented case of wide-spread usage is -`mod_wsgi `_. On the one -hand, this case provides confidence that existing subinterpreter support -is relatively stable. On the other hand, there isn't much of a sample -size from which to judge the utility of the feature. +documented cases of wide-spread usage are +`mod_wsgi `_and +`JEP `_. On the one hand, this case +provides confidence that existing subinterpreter support is relatively +stable. On the other hand, there isn't much of a sample size from which +to judge the utility of the feature. Provisional Status @@ -566,8 +581,18 @@ remove it. Alternate Python Implementations ================================ +I'll be soliciting feedback from the different Python implementors about +subinterpreter support. + +Multiple-interpter support in the major Python implementations: + TBD +* jython: yes [jython]_ +* ironpython: yes? +* pypy: maybe not? [pypy]_ +* micropython: ??? + Open Questions ============== @@ -585,11 +610,24 @@ interpreters get better isolation relative to memory management (which is necessary to stop sharing the GIL between interpreters). So the semantics of how the exceptions propagate needs to be resolved. +Possible solutions: + +* convert at the boundary (e.g. ``subprocess.CalledProcessError``) +* wrap in a proxy at the boundary (including with support for + something like ``err.raise()`` to propagate the traceback). +* return the exception (or its proxy) from ``run()`` instead of + raising it +* return a result object (like ``subprocess`` does) [result-object]_ +* throw the exception away and expect users to deal with unhandled + exceptions explicitly in the script they pass to ``run()`` + (they can pass error info out via channels); with threads you have + to do something similar + Initial support for buffers in channels --------------------------------------- An alternative to support for bytes in channels in support for -read-only buffers (the PEP 3119 kind). Then ``recv()`` would return +read-only buffers (the PEP 3118 kind). Then ``recv()`` would return a memoryview to expose the buffer in a zero-copy way. This is similar to what ``multiprocessing.Connection`` supports. [mp-conn] @@ -597,6 +635,68 @@ Switching to such an approach would help resolve questions of how passing bytes through channels will work once we isolate memory management in interpreters. +Does every interpreter think that their thread is the "main" thread? +-------------------------------------------------------------------- + +CPython's interpreter implementation identifies the OS thread in which +it was started as the "main" thread. The interpreter the has slightly +different behavior depending on if the current thread is the main one +or not. This presents a problem in cases where "main thread" is meant +to imply "main thread in the main interpreter" [main-thread]_, where +the main interpreter is the initial one. + +Disallow subinterpreters in the main thread? +-------------------------------------------- + +This is a specific case of the above issue. Currently in CPython, +"we need a main \*thread\* in order to sensibly manage the way signal +handling works across different platforms". [main-thread]_ + +Since signal handlers are part of the interpreter state, running a +subinterpreter in the main thread means that the main interpreter +can no longer properly handle signals (since it's effectively paused). + +Furthermore, running a subinterpreter in the main thread would +conceivably allow setting signal handlers on that interpreter, which +would likewise impact signal handling when that interpreter isn't +running or is running in a different thread. + +Ultimately, running subinterpreters in the main OS thread introduces +complications to the signal handling implementation. So it may make +the most sense to disallow running subinterpreters in the main thread. +Support for it could be considered later. The downside is that folks +wanting to try out subinterpreters would be required to take the extra +step of using threads. This could slow adoption and experimentation, +whereas without the restriction there's less of an obstacle. + +Pass channels explicitly to run()? +---------------------------------- + +Nick Coghlan suggested [explicit-channels]_ that we may want something more explicit than +the keyword args of ``run()`` (``**shared``):: + + The subprocess.run() comparison does make me wonder whether this + might be a more future-proof signature for Interpreter.run() though: + + def run(source_str, /, *, channels=None): + ... + + That way channels can be a namespace *specifically* for passing in + channels, and can be reported as such on RunResult. If we decide + to allow arbitrary shared objects in the future, or add flag options + like "reraise=True" to reraise exceptions from the subinterpreter + in the current interpreter, we'd have that ability, rather than + having the entire potential keyword namespace taken up for passing + shared objects. + +and:: + + It does occur to me that if we wanted to align with the way the + `runpy` module spells that concept, we'd call the option + `init_globals`, but I'm thinking it will be better to only allow + channels to be passed through directly, and require that everything + else be sent through a channel. + Deferred Functionality ====================== @@ -619,11 +719,13 @@ This suffers from the same problem as sharing objects between interpreters via queues. The minimal solution (running a source string) is sufficient for us to get the feature out where it can be explored. -timeout arg to pop() and push() -------------------------------- +timeout arg to recv() and send() +-------------------------------- Typically functions that have a ``block`` argument also have a -``timeout`` argument. We can add it later if needed. +``timeout`` argument. It sometimes makes sense to do likewise for +functions that otherwise block, like the channel ``recv()`` and +``send()`` methods. We can add it later if needed. get_main() ---------- @@ -732,13 +834,29 @@ desireable and you want to execute in a fresh ``__main__``. Also, you don't necessarily want to leak objects there that you aren't using any more. -Solutions include: +Note that the following won't work right because it will clear too much +(e.g. ``__name__`` and the other "__dunder__" attributes:: + + interp.run('globals().clear()') + +Possible solutions include: * a ``create()`` arg to indicate resetting ``__main__`` after each ``run`` call * an ``Interpreter.reset_main`` flag to support opting in or out after the fact * an ``Interpreter.reset_main()`` method to opt in when desired +* ``importlib.util.reset_globals()`` [reset_globals]_ + +Also note that reseting ``__main__`` does nothing about state stored +in other modules. So any solution would have to be clear about the +scope of what is being reset. Conceivably we could invent a mechanism +by which any (or every) module could be reset, unlike ``reload()`` +which does not clear the module before loading into it. Regardless, +since ``__main__`` is the execution namespace of the interpreter, +resetting it has a much more direct correlation to interpreters and +their dynamic state than does resetting other modules. So a more +generic module reset mechanism may prove unnecessary. This isn't a critical feature initially. It can wait until later if desirable. @@ -760,6 +878,70 @@ would be a good candidate for the first effort at expanding the types that channels support. They aren't strictly necessary for the initial API. +Integration with async +---------------------- + +Per Antoine Pitrou [async]_:: + + Has any thought been given to how FIFOs could integrate with async + code driven by an event loop (e.g. asyncio)? I think the model of + executing several asyncio (or Tornado) applications each in their + own subinterpreter may prove quite interesting to reconcile multi- + core concurrency with ease of programming. That would require the + FIFOs to be able to synchronize on something an event loop can wait + on (probably a file descriptor?). + +A possible solution is to provide async implementations of the blocking +channel methods (``__next__()``, ``recv()``, and ``send()``). However, +the basic functionality of subinterpreters does not depend on async and +can be added later. + +Support for iteration +--------------------- + +Supporting iteration on ``RecvChannel`` (via ``__iter__()`` or +``_next__()``) may be useful. A trivial implementation would use the +``recv()`` method, similar to how files do iteration. Since this isn't +a fundamental capability and has a simple analog, adding iteration +support can wait until later. + +Channel context managers +------------------------ + +Context manager support on ``RecvChannel`` and ``SendChannel`` may be +helpful. The implementation would be simple, wrapping a call to +``close()`` like files do. As with iteration, this can wait. + +Pipes and Queues +---------------- + +With the proposed object passing machanism of "channels", other similar +basic types aren't required to achieve the minimal useful functionality +of subinterpreters. Such types include pipes (like channels, but +one-to-one) and queues (like channels, but buffered). See below in +`Rejected Ideas` for more information. + +Even though these types aren't part of this proposal, they may still +be useful in the context of concurrency. Adding them later is entirely +reasonable. The could be trivially implemented as wrappers around +channels. Alternatively they could be implemented for efficiency at the +same low level as channels. + +interpreters.RunFailedError +--------------------------- + +As currently proposed, ``Interpreter.run()`` offers you no way to +distinguish an error coming from sub-interpreter from any other +error in the current interpreter. Your only option would be to +explicitly wrap your ``run()`` call in a ``try: ... except Exception:``. + +If this is a problem in practice then would could add something like +``interpreters.RunFailedError`` and raise that in ``run()``, chaining +the actual error. + +Of course, this depends on how we resolve `Leaking exceptions across +interpreters`_. + Rejected Ideas ============== @@ -846,6 +1028,36 @@ References .. [mp-conn] https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Connection +.. [bug-rate] + https://mail.python.org/pipermail/python-ideas/2017-September/047094.html + +.. [benefits] + https://mail.python.org/pipermail/python-ideas/2017-September/047122.html + +.. [main-thread] + https://mail.python.org/pipermail/python-ideas/2017-September/047144.html + https://mail.python.org/pipermail/python-dev/2017-September/149566.html + +.. [explicit-channels] + https://mail.python.org/pipermail/python-dev/2017-September/149562.html + https://mail.python.org/pipermail/python-dev/2017-September/149565.html + +.. [reset_globals] + https://mail.python.org/pipermail/python-dev/2017-September/149545.html + +.. [async] + https://mail.python.org/pipermail/python-dev/2017-September/149420.html + https://mail.python.org/pipermail/python-dev/2017-September/149585.html + +.. [result-object] + https://mail.python.org/pipermail/python-dev/2017-September/149562.html + +.. [jython] + https://mail.python.org/pipermail/python-ideas/2017-May/045771.html + +.. [pypy] + https://mail.python.org/pipermail/python-ideas/2017-September/046973.html + Copyright =========