diff --git a/pep-0554.rst b/pep-0554.rst index d7394d369e9..95f1d8effb3 100644 --- a/pep-0554.rst +++ b/pep-0554.rst @@ -15,7 +15,7 @@ Abstract CPython has supported multiple interpreters in the same process (AKA "subinterpreters") since version 1.5 (1997). The feature has been -available via the C-API. [c-api]_ Subinterpreters operate in +available via the C-API. [c-api]_ Subinterpreters operate in `relative isolation from one another `_, which provides the basis for an `alternative concurrency model `_. @@ -51,6 +51,7 @@ At first only the following types will be supported for sharing: * str * int * PEP 3118 buffer objects (via ``send_buffer()``) +* PEP 554 channels Support for other basic types (e.g. bool, float, Ellipsis) will be added later. @@ -152,7 +153,7 @@ For sharing data between interpreters: | | | receiving end of the channel and wait. | | | | Associate the interpreter with the channel. | +---------------------------+-------------------------------------------------+ -| .send_nowait(obj) | | Like send(), but Fail if not received. | +| .send_nowait(obj) | | Like send(), but fail if not received. | +---------------------------+-------------------------------------------------+ | .send_buffer(obj) | | Send the object's (PEP 3118) buffer to the | | | | receiving end of the channel and wait. | @@ -242,6 +243,24 @@ Handling an exception except interpreters.RunFailedError as exc: print(f"got the error from the subinterpreter: {exc}") +Re-raising an exception +----------------------- + +:: + + interp = interpreters.create() + try: + try: + interp.run(tw.dedent(""" + raise KeyError + """)) + except interpreters.RunFailedError as exc: + raise exc.__cause__ + except KeyError: + print("got a KeyError from the subinterpreter") + +Note that this pattern is a candidate for later improvement. + Synchronize using a channel --------------------------- @@ -494,8 +513,8 @@ each with different goals. Most center on correctness and usability. One class of concurrency models focuses on isolated threads of execution that interoperate through some message passing scheme. A -notable example is `Communicating Sequential Processes`_ (CSP), upon -which Go's concurrency is based. The isolation inherent to +notable example is `Communicating Sequential Processes`_ (CSP) (upon +which Go's concurrency is roughly based). The isolation inherent to subinterpreters makes them well-suited to this approach. Shared data @@ -521,9 +540,9 @@ There are a number of valid solutions, several of which may be appropriate to support in Python. This proposal provides a single basic solution: "channels". Ultimately, any other solution will look similar to the proposed one, which will set the precedent. Note that the -implementation of ``Interpreter.run()`` can be done in a way that allows -for multiple solutions to coexist, but doing so is not technically -a part of the proposal here. +implementation of ``Interpreter.run()`` will be done in a way that +allows for multiple solutions to coexist, but doing so is not +technically a part of the proposal here. Regarding the proposed solution, "channels", it is a basic, opt-in data sharing mechanism that draws inspiration from pipes, queues, and CSP's @@ -534,7 +553,8 @@ channels have two operations: send and receive. A key characteristic of those operations is that channels transmit data derived from Python objects rather than the objects themselves. When objects are sent, their data is extracted. When the "object" is received in the other -interpreter, the data is converted back into an object. +interpreter, the data is converted back into an object owned by that +interpreter. To make this work, the mutable shared state will be managed by the Python runtime, not by any of the interpreters. Initially we will @@ -552,6 +572,7 @@ channels to the following: * str * int * PEP 3118 buffer objects (via ``send_buffer()``) +* channels Limiting the initial shareable types is a practical matter, reducing the potential complexity of the initial implementation. There are a @@ -589,11 +610,11 @@ Finally, some potential isolation is missing due to the current design of CPython. Improvements are currently going on to address gaps in this area: -* interpreters share the GIL -* interpreters share memory management (e.g. allocators, gc) * GC is not run per-interpreter [global-gc]_ * at-exit handlers are not run per-interpreter [global-atexit]_ * extensions using the ``PyGILState_*`` API are incompatible [gilstate]_ +* interpreters share memory management (e.g. allocators, gc) +* interpreters share the GIL Existing Usage -------------- @@ -683,7 +704,7 @@ The module also provides the following class: "channels" keyword argument is provided (and is a mapping of attribute names to channels) then it is added to the interpreter's execution namespace (the interpreter's "__main__" module). If any - of the values are not are not RecvChannel or SendChannel instances + of the values are not RecvChannel or SendChannel instances then ValueError gets raised. This may not be called on an already running interpreter. Doing @@ -737,9 +758,9 @@ interpreters, we create a surrogate of the exception and its traceback (see ``traceback.TracebackException``), set it to ``__cause__`` on a new ``RunFailedError``, and raise that. -Raising (a proxy of) the exception is problematic since it's harder to -distinguish between an error in the ``run()`` call and an uncaught -exception from the subinterpreter. +Raising (a proxy of) the exception directly is problematic since it's +harder to distinguish between an error in the ``run()`` call and an +uncaught exception from the subinterpreter. API for sharing data @@ -763,14 +784,15 @@ whether an object is shareable or not: a cross-interpreter way, whether via a proxy, a copy, or some other means. -This proposal provides two ways to do share such objects between +This proposal provides two ways to share such objects between interpreters. -First, shareable objects may be passed to ``run()`` as keyword arguments, -where they are effectively injected into the target interpreter's -``__main__`` module. This is mainly intended for sharing meta-objects -(e.g. channels) between interpreters, as it is less useful to pass other -objects (like ``bytes``) to ``run``. +First, channels may be passed to ``run()`` via the ``channels`` +keyword argument, where they are effectively injected into the target +interpreter's ``__main__`` module. While passing arbitrary shareable +objects this way is possible, doing so is mainly intended for sharing +meta-objects (e.g. channels) between interpreters. It is less useful +to pass other objects (like ``bytes``) to ``run`` directly. Second, the main mechanism for sharing objects (i.e. their data) between interpreters is through channels. A channel is a simplex FIFO similar @@ -778,6 +800,9 @@ to a pipe. The main difference is that channels can be associated with zero or more interpreters on either end. Unlike queues, which are also many-to-many, channels have no buffer. +The ``interpreters`` module provides the following functions and +classes related to channels: + ``create_channel()``:: Create a new channel and return (recv, send), the RecvChannel and @@ -802,24 +827,25 @@ many-to-many, channels have no buffer. ``RecvChannel(id)``:: The receiving end of a channel. An interpreter may use this to - receive objects from another interpreter. At first only bytes will - be supported. + receive objects from another interpreter. At first only a few of + the simple, immutable builtin types will be supported. id: - The channel's unique ID. + The channel's unique ID. This is shared with the "send" end. interpreters: The list of associated interpreters: those that have called - the "recv()" or "__next__()" methods and haven't called - "release()" (and the channel hasn't been explicitly closed). + the "recv()" method and haven't called "release()" (and the + channel hasn't been explicitly closed). recv(): Return the next object (i.e. the data from the sent object) from the channel. If none have been sent then wait until the next - send. This associates the current interpreter with the channel. + send. This associates the current interpreter with the "recv" + end of the channel. If the channel is already closed then raise ChannelClosedError. If the channel isn't closed but the current interpreter already @@ -848,7 +874,7 @@ many-to-many, channels have no buffer. to 0, the channel is actually marked as closed. The Python runtime will garbage collect all closed channels, though it may not be immediately. Note that "release()" is automatically called - in behalf of the current interpreter when the channel is no longer + on behalf of the current interpreter when the channel is no longer used (i.e. has no references) in that interpreter. This operation is idempotent. Return True if "release()" has not @@ -857,21 +883,21 @@ many-to-many, channels have no buffer. close(force=False): Close both ends of the channel (in all interpreters). This means - that any further use of the channel raises ChannelClosedError. If - the channel is not empty then raise ChannelNotEmptyError (if - "force" is False) or discard the remaining objects (if "force" - is True) and close it. + that any further use of the channel anywhere raises + ChannelClosedError. If the channel is not empty then raise + ChannelNotEmptyError (if "force" is False) or discard the + remaining objects (if "force" is True) and close it. ``SendChannel(id)``:: The sending end of a channel. An interpreter may use this to send - objects to another interpreter. At first only bytes will be - supported. + objects to another interpreter. At first only a few of + the simple, immutable builtin types will be supported. id: - The channel's unique ID. + The channel's unique ID. This is shared with the "recv" end. interpreters: @@ -882,8 +908,9 @@ many-to-many, channels have no buffer. Send the object (i.e. its data) to the receiving end of the channel. Wait until the object is received. If the the - object is not shareable then ValueError is raised. Currently - only bytes are supported. + object is not shareable then ValueError is raised. This + associates the current interpreter with the "send" end of the + channel. If the channel is already closed then raise ChannelClosedError. If the channel isn't closed but the current interpreter already @@ -892,9 +919,10 @@ many-to-many, channels have no buffer. send_nowait(obj): - Send the object to the receiving end of the channel. If the other - end is not currently receiving then raise NotReceivedError. - Otherwise this is the same as "send()". + Send the object to the receiving end of the channel. If no + interpreter is currently receiving (waiting on the other end) + then raise NotReceivedError. Otherwise this is the same as + "send()". send_buffer(obj): @@ -918,9 +946,9 @@ many-to-many, channels have no buffer. Close both ends of the channel (in all interpreters). No matter what the "send" end of the channel is immediately closed. If the channel is empty then close the "recv" end immediately too. - Otherwise wait until the channel is empty before closing it (if - "force" is False) or discard the remaining items and close - immediately (if "force" is True). + Otherwise, if "force" if False, close the "recv" end (and hence + the full channel) once the channel becomes empty; or, if "force" + is True, discard the remaining items and close immediately. Note that ``send_buffer()`` is similar to how ``multiprocessing.Connection`` works. [mp-conn]_ @@ -929,53 +957,10 @@ Note that ``send_buffer()`` is similar to how Open Questions ============== -* "force" argument to ``ch.release()``? * add a "tp_share" type slot instead of using a global registry for shareable types? -Open Implementation Questions -============================= - -Does every interpreter think that their thread is the "main" thread? --------------------------------------------------------------------- - -(This is more of an implementation detail that an issue for the PEP.) - -CPython's interpreter implementation identifies the OS thread in which -it was started as the "main" thread. The interpreter the has slightly -different behavior depending on if the current thread is the main one -or not. This presents a problem in cases where "main thread" is meant -to imply "main thread in the main interpreter" [main-thread]_, where -the main interpreter is the initial one. - -Disallow subinterpreters in the main thread? --------------------------------------------- - -(This is more of an implementation detail that an issue for the PEP.) - -This is a specific case of the above issue. Currently in CPython, -"we need a main \*thread\* in order to sensibly manage the way signal -handling works across different platforms". [main-thread]_ - -Since signal handlers are part of the interpreter state, running a -subinterpreter in the main thread means that the main interpreter -can no longer properly handle signals (since it's effectively paused). - -Furthermore, running a subinterpreter in the main thread would -conceivably allow setting signal handlers on that interpreter, which -would likewise impact signal handling when that interpreter isn't -running or is running in a different thread. - -Ultimately, running subinterpreters in the main OS thread introduces -complications to the signal handling implementation. So it may make -the most sense to disallow running subinterpreters in the main thread. -Support for it could be considered later. The downside is that folks -wanting to try out subinterpreters would be required to take the extra -step of using threads. This could slow adoption and experimentation, -whereas without the restriction there's less of an obstacle. - - Deferred Functionality ====================== @@ -1048,10 +1033,11 @@ Syntactic Support The ``Go`` language provides a concurrency model based on CSP, so it's similar to the concurrency model that subinterpreters support. -``Go`` provides syntactic support, as well several builtin concurrency -primitives, to make concurrency a first-class feature. Conceivably, -similar syntactic (and builtin) support could be added to Python using -subinterpreters. However, that is *way* outside the scope of this PEP! +However, ``Go`` also provides syntactic support, as well several builtin +concurrency primitives, to make concurrency a first-class feature. +Conceivably, similar syntactic (and builtin) support could be added to +Python using subinterpreters. However, that is *way* outside the scope +of this PEP! Multiprocessing --------------- @@ -1072,19 +1058,21 @@ raise an ImportError if unsupported. Alternately we could support opting in to subinterpreter support. However, that would probably exclude many more modules (unnecessarily) -than the opt-out approach. +than the opt-out approach. Also, note that PEP 489 defined that an +extension's use of the PEP's machinery implies support for +subinterpreters. The scope of adding the ModuleDef slot and fixing up the import machinery is non-trivial, but could be worth it. It all depends on -how many extension modules break under subinterpreters. Given the -relatively few cases we know of through mod_wsgi, we can leave this -for later. +how many extension modules break under subinterpreters. Given that +there are relatively few cases we know of through mod_wsgi, we can +leave this for later. Poisoning channels ------------------ CSP has the concept of poisoning a channel. Once a channel has been -poisoned, and ``send()`` or ``recv()`` call on it will raise a special +poisoned, any ``send()`` or ``recv()`` call on it would raise a special exception, effectively ending execution in the interpreter that tried to use the poisoned channel. @@ -1092,15 +1080,6 @@ This could be accomplished by adding a ``poison()`` method to both ends of the channel. The ``close()`` method can be used in this way (mostly), but these semantics are relatively specialized and can wait. -Sending channels over channels ------------------------------- - -Some advanced usage of subinterpreters could take advantage of the -ability to send channels over channels, in addition to bytes. Given -that channels will already be multi-interpreter safe, supporting then -in ``RecvChannel.recv()`` wouldn't be a big change. However, this can -wait until the basic functionality has been ironed out. - Reseting __main__ ----------------- @@ -1161,7 +1140,7 @@ Per Antoine Pitrou [async]_:: on (probably a file descriptor?). A possible solution is to provide async implementations of the blocking -channel methods (``__next__()``, ``recv()``, and ``send()``). However, +channel methods (``recv()``, and ``send()``). However, the basic functionality of subinterpreters does not depend on async and can be added later. @@ -1320,6 +1299,39 @@ Rejected possible solutions: to do something similar +Implementation +============== + +The implementation of the PEP has 4 parts: + +* the high-level module described in this PEP (mostly a light wrapper + around a low-level C extension +* the low-level C extension module +* additions to the ("private") C=API needed by the low-level module +* secondary fixes/changes in the CPython runtime that facilitate + the low-level module (among other benefits) + +These are at various levels of completion, with more done the lower +you go: + +* the high-level module has been, at best, roughly implemented. + However, fully implementing it will be almost trivial. +* the low-level module is mostly complete. The bulk of the + implementation was merged into master in December 2018 as the + "_xxsubinterpreters" module (for the sake of testing subinterpreter + functionality). Only 3 parts of the implementation remain: + "send_wait()", "send_buffer()", and exception propagation. All three + have been mostly finished, but were blocked by work related to ceval. + That blocker is basically resolved now and finishing the low-level + will not require extensive work. +* all necessary C-API work has been finished +* all anticipated work in the runtime has been finished + +The implementation effort for PEP 554 is being tracked as part of +a larger project aimed at improving multi-core support in CPython. +[multi-core-project]_ + + References ========== @@ -1389,6 +1401,9 @@ References .. [pypy] https://mail.python.org/pipermail/python-ideas/2017-September/046973.html +.. [multi-core-project] + https://github.com/ericsnowcurrently/multi-core-python + Copyright =========