diff --git a/pep-0554.rst b/pep-0554.rst index b8b47cdc0c3..f6f7992cb11 100644 --- a/pep-0554.rst +++ b/pep-0554.rst @@ -33,9 +33,12 @@ provide a high-level interface to subinterpreters and wrap the low-level ``threading`` module. See the `Examples`_ section for concrete usage and use cases. +API for interpreters +-------------------- + The module provides the following functions: -``list()``:: +``list_all()``:: Return a list of all existing interpreters. @@ -51,7 +54,8 @@ The module provides the following functions: in any thread and will run in whichever thread calls ``interp.run()``. -The module also provides the following classes: + +The module also provides the following class: ``Interpreter(id)``:: @@ -71,9 +75,13 @@ The module also provides the following classes: This may not be called on an already running interpreter. Doing so results in a RuntimeError. - run(source_str): + run(source_str, /, **shared): - Run the provided Python source code in the interpreter. + Run the provided Python source code in the interpreter. Any + keyword arguments are added to the interpreter's execution + namespace. If any of the values are not supported for sharing + between interpreters then RuntimeError gets raised. Currently + only channels (see "create_channel()" below) are supported. This may not be called on an already running interpreter. Doing so results in a RuntimeError. @@ -112,116 +120,133 @@ The module also provides the following classes: Supported code: source text. - get_fifo(name): +API for sharing data +-------------------- - Return the FIFO object with the given name that is associated - with this interpreter. If no such FIFO exists then raise - KeyError. The FIFO will be either a "FIFOReader" or a - "FIFOWriter", depending on which "add_*_fifo()" was called. +The mechanism for passing objects between interpreters is through +channels. A channel is a simplex FIFO similar to a pipe. The main +difference is that channels can be associated with zero or more +interpreters on either end. Unlike queues, which are also many-to-many, +channels have no buffer. - list_fifos(): +``create_channel()``:: - Return a list of all fifos associated with the interpreter. + Create a new channel and return (recv, send), the RecvChannel and + SendChannel corresponding to the ends of the channel. The channel + is not closed and destroyed (i.e. garbage-collected) until the number + of associated interpreters returns to 0. - add_recv_fifo(name=None): + An interpreter gets associated with a channel by calling its "send()" + or "recv()" method. That association gets dropped by calling + "close()" on the channel. - Create a new FIFO, associate the two ends with the involved - interpreters, and return the side associated with the interpreter - in which "add_recv_fifo()" was called. A FIFOReader gets tied to - this interpreter. A FIFOWriter gets tied to the interpreter that - called "add_recv_fifo()". + Both ends of the channel are supported "shared" objects (i.e. may be + safely shared by different interpreters. Thus they may be passed as + keyword arguments to "Interpreter.run()". - The FIFO's name is set to the provided value. If no name is - provided then a dynamically generated one is used. If a FIFO - with the given name is already associated with this interpreter - (or with the one in which "add_recv_fifo()" was called) then raise - KeyError. +``list_all_channels()``:: - add_send_fifo(name=None): + Return a list of all open (RecvChannel, SendChannel) pairs. - Create a new FIFO, associate the two ends with the involved - interpreters, and return the side associated with the interpreter - in which "add_recv_fifo()" was called. A FIFOWriter gets tied to - this interpreter. A FIFOReader gets tied to the interpreter that - called "add_recv_fifo()". - The FIFO's name is set to the provided value. If no name is - provided then a dynamically generated one is used. If a FIFO - with the given name is already associated with this interpreter - (or with the one in which "add_send_fifo()" was called) then raise - KeyError. +``RecvChannel(id)``:: - remove_fifo(name): + The receiving end of a channel. An interpreter may use this to + receive objects from another interpreter. At first only bytes will + be supported. - Drop the association between the named FIFO and this interpreter. - If the named FIFO is not found then raise KeyError. + id: + The channel's unique ID. -``FIFOReader(name)``:: + interpreters: - The receiving end of a FIFO. An interpreter may use this to receive - objects from another interpreter. At first only bytes and None will - be supported. + The list of associated interpreters (those that have called + the "recv()" method). - name: + __next__(): - The FIFO's name. + Return the next object from the channel. If none have been sent + then wait until the next send. - __next__(): + recv(): - Return the next bytes object from the pipe. If none have been - pushed on then block. + Return the next object from the channel. If none have been sent + then wait until the next send. If the channel has been closed + then EOFError is raised. - pop(*, block=True): + recv_nowait(default=None): - Return the next bytes object from the pipe. If none have been - pushed on and "block" is True (the default) then block. - Otherwise return None. + Return the next object from the channel. If none have been sent + then return the default. If the channel has been closed + then EOFError is raised. + close(): -``FIFOWriter(name)``:: + No longer associate the current interpreter with the channel (on + the receiving end). This is a noop if the interpreter isn't + already associated. Once an interpreter is no longer associated + with the channel, subsequent (or current) send() and recv() calls + from that interpreter will raise EOFError. - The sending end of a FIFO. An interpreter may use this to send - objects to another interpreter. At first only bytes and None will - be supported. + Once number of associated interpreters on both ends drops to 0, + the channel is actually marked as closed. The Python runtime + will garbage collect all closed channels. Note that "close()" is + automatically called when it is no longer used in the current + interpreter. - name: + This operation is idempotent. Return True if the current + interpreter was still associated with the receiving end of the + channel and False otherwise. - The FIFO's name. - push(object, *, block=True): +``SendChannel(id)``:: - Add the object to the FIFO. If "block" is true then block - until the object is popped off. If the FIFO does not support - the object's type then TypeError is raised. + The sending end of a channel. An interpreter may use this to send + objects to another interpreter. At first only bytes will be + supported. -About FIFOs ------------ + id: -Subinterpreters are inherently isolated (with caveats explained below), -in contrast to threads. This enables `a different concurrency model -`_ than currently exists in Python. -`Communicating Sequential Processes`_ (CSP) is the prime example. + The channel's unique ID. -A key component of this approach to concurrency is message passing. So -providing a message/object passing mechanism alongside ``Interpreter`` -is a fundamental requirement. This proposal includes a basic mechanism -upon which more complex machinery may be built. That basic mechanism -draws inspiration from pipes, queues, and CSP's channels. + interpreters: -The key challenge here is that sharing objects between interpreters -faces complexity due in part to CPython's current memory model. -Furthermore, in this class of concurrency, the ideal is that objects -only exist in one interpreter at a time. However, this is not practical -for Python so we initially constrain supported objects to ``bytes`` and -``None``. There are a number of strategies we may pursue in the future -to expand supported objects and object sharing strategies. + The list of associated interpreters (those that have called + the "send()" method). -Note that the complexity of object sharing increases as subinterpreters -become more isolated, e.g. after GIL removal. So the mechanism for -message passing needs to be carefully considered. Keeping the API -minimal and initially restricting the supported types helps us avoid -further exposing any underlying complexity to Python users. + send(obj): + + Send the object to the receiving end of the channel. Wait until + the object is received. If the channel does not support the + object then TypeError is raised. Currently only bytes are + supported. If the channel has been closed then EOFError is + raised. + + send_nowait(obj): + + Send the object to the receiving end of the channel. If the + object is received then return True. Otherwise return False. + If the channel does not support the object then TypeError is + raised. If the channel has been closed then EOFError is raised. + + close(): + + No longer associate the current interpreter with the channel (on + the sending end). This is a noop if the interpreter isn't already + associated. Once an interpreter is no longer associated with the + channel, subsequent (or current) send() and recv() calls from that + interpreter will raise EOFError. + + Once number of associated interpreters on both ends drops to 0, + the channel is actually marked as closed. The Python runtime + will garbage collect all closed channels. Note that "close()" is + automatically called when it is no longer used in the current + interpreter. + + This operation is idempotent. Return True if the current + interpreter was still associated with the sending end of the + channel and False otherwise. Examples @@ -279,26 +304,26 @@ Handling an exception except KeyError: print("got the error from the subinterpreter") -Synchronize using a FIFO ------------------------- +Synchronize using a channel +--------------------------- :: interp = interpreters.create() - writer = interp.add_recv_fifo('spam') + r, s = interpreters.create_channel() def run(): interp.run("""if True: - import interpreters - interp = interpreters.get_current() - reader = interp.get_fifo('spam') - reader.pop() + reader.recv() print("during") - """) + reader.close() + """, + reader=r) t = threading.Thread(target=run) print('before') t.start() print('after') - writer.push(None) + s.send(b'') + s.close() Sharing a file descriptor ------------------------- @@ -306,24 +331,23 @@ Sharing a file descriptor :: interp = interpreters.create() - writer = interp.add_recv_fifo('spam') - reader = interp.add_send_fifo('done') + r1, s1 = interpreters.create_channel() + r2, s2 = interpreters.create_channel() def run(): interp.run("""if True: - import interpreters - interp = interpreters.get_current() - reader = interp.get_fifo('spam') - writer = interp.get_fifo('done') - fd = reader.pop() + fd = int.from_bytes( + reader.recv(), 'big') for line in os.fdopen(fd): print(line) - writer.push(None) - """) + writer.send(b'') + """, + reader=r1, writer=s2) t = threading.Thread(target=run) t.start() with open('spamspamspam') as infile: - writer.push(infile.fileno()) - reader.pop() + fd = infile.fileno().to_bytes(1, 'big') + s.send(fd) + r.recv() Passing objects via pickle -------------------------- @@ -331,27 +355,27 @@ Passing objects via pickle :: interp = interpreters.create() - writer = interp.add_recv_fifo('spam') + r, s = interpreters.create_channel() interp.run("""if True: import pickle - import interpreters - interp = interpreters.get_current() - reader = interp.get_fifo('spam') - """) + """, + reader=r) def run(): interp.run("""if True: - data = reader.pop() - while data is not None: + data = reader.recv() + while data: obj = pickle.loads(data) do_something(obj) - data = reader.pop() - """) + data = reader.recv() + reader.close() + """, + reader=r) t = threading.Thread(target=run) t.start() for obj in input: data = pickle.dumps(obj) - writer.push(data) - writer.push(None) + s.send(data) + s.send(b'') Rationale @@ -432,6 +456,40 @@ which subinterpreters are worth it. About Subinterpreters ===================== +Shared data +----------- + +Subinterpreters are inherently isolated (with caveats explained below), +in contrast to threads. This enables `a different concurrency model +`_ than is currently readily available in Python. +`Communicating Sequential Processes`_ (CSP) is the prime example. + +A key component of this approach to concurrency is message passing. So +providing a message/object passing mechanism alongside ``Interpreter`` +is a fundamental requirement. This proposal includes a basic mechanism +upon which more complex machinery may be built. That basic mechanism +draws inspiration from pipes, queues, and CSP's channels. [fifo]_ + +The key challenge here is that sharing objects between interpreters +faces complexity due in part to CPython's current memory model. +Furthermore, in this class of concurrency, the ideal is that objects +only exist in one interpreter at a time. However, this is not practical +for Python so we initially constrain supported objects to ``bytes``. +There are a number of strategies we may pursue in the future to expand +supported objects and object sharing strategies. + +Note that the complexity of object sharing increases as subinterpreters +become more isolated, e.g. after GIL removal. So the mechanism for +message passing needs to be carefully considered. Keeping the API +minimal and initially restricting the supported types helps us avoid +further exposing any underlying complexity to Python users. + +To make this work, the mutable shared state will be managed by the +Python runtime, not by any of the interpreters. Initially we will +support only one type of objects for shared state: the channels provided +by ``create_channel()``. Channels, in turn, will carefully manage +passing objects between interpreters. + Interpreter Isolation --------------------- @@ -511,6 +569,35 @@ Alternate Python Implementations TBD +Open Questions +============== + +Leaking exceptions across interpreters +-------------------------------------- + +As currently proposed, uncaught exceptions from ``run()`` propagate +to the frame that called it. However, this means that exception +objects are leaking across the inter-interpreter boundary. Likewise, +the frames in the traceback potentially leak. + +While that might not be a problem currently, it would be a problem once +interpreters get better isolation relative to memory management (which +is necessary to stop sharing the GIL between interpreters). So the +semantics of how the exceptions propagate needs to be resolved. + +Initial support for buffers in channels +--------------------------------------- + +An alternative to support for bytes in channels in support for +read-only buffers (the PEP 3119 kind). Then ``recv()`` would return +a memoryview to expose the buffer in a zero-copy way. This is similar +to what ``multiprocessing.Connection`` supports. [mp-conn] + +Switching to such an approach would help resolve questions of how +passing bytes through channels will work once we isolate memory +management in interpreters. + + Deferred Functionality ====================== @@ -564,7 +651,7 @@ The ``threading`` module provides a number of synchronization primitives for coordinating concurrent operations. This is especially necessary due to the shared-state nature of threading. In contrast, subinterpreters do not share state. Data sharing is restricted to -FIFOs, which do away with the need for explicit synchronization. If +channels, which do away with the need for explicit synchronization. If any sort of opt-in shared state support is added to subinterpreters in the future, that same effort can introduce synchronization primitives to meet that need. @@ -594,6 +681,127 @@ way it supports threads and processes. In fact, the module's maintainer, Davin Potts, has indicated this is a reasonable feature request. However, it is outside the narrow scope of this PEP. +C-extension opt-in/opt-out +-------------------------- + +By using the ``PyModuleDef_Slot`` introduced by PEP 489, we could easily +add a mechanism by which C-extension modules could opt out of support +for subinterpreters. Then the import machinery, when operating in +a subinterpreter, would need to check the module for support. It would +raise an ImportError if unsupported. + +Alternately we could support opting in to subinterpreter support. +However, that would probably exclude many more modules (unnecessarily) +than the opt-out approach. + +The scope of adding the ModuleDef slot and fixing up the import +machinery is non-trivial, but could be worth it. It all depends on +how many extension modules break under subinterpreters. Given the +relatively few cases we know of through mod_wsgi, we can leave this +for later. + +Poisoning channels +------------------ + +CSP has the concept of poisoning a channel. Once a channel has been +poisoned, and ``send()`` or ``recv()`` call on it will raise a special +exception, effectively ending execution in the interpreter that tried +to use the poisoned channel. + +This could be accomplished by adding a ``poison()`` method to both ends +of the channel. The ``close()`` method could work if it had a ``force`` +option to force the channel closed. Regardless, these semantics are +relatively specialized and can wait. + +Sending channels over channels +------------------------------ + +Some advanced usage of subinterpreters could take advantage of the +ability to send channels over channels, in addition to bytes. Given +that channels will already be multi-interpreter safe, supporting then +in ``RecvChannel.recv()`` wouldn't be a big change. However, this can +wait until the basic functionality has been ironed out. + +Reseting __main__ +----------------- + +As proposed, every call to ``Interpreter.run()`` will execute in the +namespace of the interpreter's existing ``__main__`` module. This means +that data persists there between ``run()`` calls. Sometimes this isn't +desireable and you want to execute in a fresh ``__main__``. Also, +you don't necessarily want to leak objects there that you aren't using +any more. + +Solutions include: + +* a ``create()`` arg to indicate resetting ``__main__`` after each + ``run`` call +* an ``Interpreter.reset_main`` flag to support opting in or out + after the fact +* an ``Interpreter.reset_main()`` method to opt in when desired + +This isn't a critical feature initially. It can wait until later +if desirable. + +Support passing ints in channels +-------------------------------- + +Passing ints around should be fine and ultimately is probably +desirable. However, we can get by with serializing them as bytes +for now. The goal is a minimal API for the sake of basic +functionality at first. + +File descriptors and sockets in channels +---------------------------------------- + +Given that file descriptors and sockets are process-global resources, +support for passing them through channels is a reasonable idea. They +would be a good candidate for the first effort at expanding the types +that channels support. They aren't strictly necessary for the initial +API. + + +Rejected Ideas +============== + +Explicit channel association +---------------------------- + +Interpreters are implicitly associated with channels upon ``recv()`` and +``send()`` calls. They are de-associated with ``close()`` calls. The +alternative would be explicit methods. It would be either +``add_channel()`` and ``remove_channel()`` methods on ``Interpreter`` +objects or something similar on channel objects. + +In practice, this level of management shouldn't be necessary for users. +So adding more explicit support would only add clutter to the API. + +Use pipes instead of channels +----------------------------- + +A pipe would be a simplex FIFO between exactly two interpreters. For +most use cases this would be sufficient. It could potentially simplify +the implementation as well. However, it isn't a big step to supporting +a many-to-many simplex FIFO via channels. Also, with pipes the API +ends up being slightly more complicated, requiring naming the pipes. + +Use queues instead of channels +------------------------------ + +The main difference between queues and channels is that queues support +buffering. This would complicate the blocking semantics of ``recv()`` +and ``send()``. Also, queues can be built on top of channels. + +"enumerate" +----------- + +The ``list_all()`` function provides the list of all interpreters. +In the threading module, which partly inspired the proposed API, the +function is called ``enumerate()``. The name is different here to +avoid confusing Python users that are not already familiar with the +threading API. For them "enumerate" is rather unclear, whereas +"list_all" is clear. + References ========== @@ -607,6 +815,14 @@ References https://en.wikipedia.org/wiki/Communicating_sequential_processes https://github.com/futurecore/python-csp +.. [fifo] + https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Pipe + https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Queue + https://docs.python.org/3/library/queue.html#module-queue + http://stackless.readthedocs.io/en/2.7-slp/library/stackless/channels.html + https://golang.org/doc/effective_go.html#sharing + http://www.jtolds.com/writing/2016/03/go-channels-are-bad-and-you-should-feel-bad/ + .. [caveats] https://docs.python.org/3/c-api/init.html#bugs-and-caveats @@ -627,6 +843,9 @@ References .. [global-atexit] https://bugs.python.org/issue6531 +.. [mp-conn] + https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Connection + Copyright =========