From f78084da15dab1dd6fdbc9b10e2745ac86e55110 Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Tue, 12 Dec 2023 13:50:00 +0100 Subject: [PATCH 1/4] [Python][Docs] Document the Arrow PyCapsule protocol in the 'extending pyarrow' section of the Python docs --- .../CDataInterface/PyCapsuleInterface.rst | 2 ++ docs/source/python/extending_types.rst | 32 +++++++++++++++++++ 2 files changed, 34 insertions(+) diff --git a/docs/source/format/CDataInterface/PyCapsuleInterface.rst b/docs/source/format/CDataInterface/PyCapsuleInterface.rst index 0c1a01d7c67..03095aa2e93 100644 --- a/docs/source/format/CDataInterface/PyCapsuleInterface.rst +++ b/docs/source/format/CDataInterface/PyCapsuleInterface.rst @@ -16,6 +16,8 @@ .. under the License. +.. _arrow-pycapsule-interface: + ============================= The Arrow PyCapsule Interface ============================= diff --git a/docs/source/python/extending_types.rst b/docs/source/python/extending_types.rst index ee92cebcb54..c2649685caa 100644 --- a/docs/source/python/extending_types.rst +++ b/docs/source/python/extending_types.rst @@ -21,6 +21,38 @@ Extending pyarrow ================= +Controlling conversion to (Py)Arrow with the PyCapsule Interface +---------------------------------------------------------------- + +The :ref:`Arrow C data interface ` allows moving Arrow data between +different implementations of Arrow. This is a generic, cross-language interface not +specific to Python, but for Python libraries this interface is extended with a Python +specific layer: :ref:`arrow-pycapsule-interface`. + +This Python interface ensures that different libraries that support the C Data interface +can recognize each other objects and export Arrow data structures in a standard way. + +If you have a library providing data structures that hold Arrow-compatible data +under the hood, you can implement the following dunder methods on those objects: + +- ``__arrow_c_schema__`` for schema or type-like objects. +- ``__arrow_c_array__`` for arrays and record batches (contiguous tables). +- ``__arrow_c_stream__`` for chunked tables or streams of data. + +Those methods return `PyCapsule `__ +objects, and more details on the exact semantics can be found in the +:ref:`specification `. + +When your data structures have those dunder methods defined, the pyarrow constructors +(such as :func:`pyarrow.array` or :func:`pyarrow.table`) will recognize those objects as +supporting this protocol, and convert them to PyArrow data structures zero-copy. And the +same can be true for any other library supporting this protocol on ingesting data. + +Similarly, if your library has functions that accept user-provided data, you can add +support for this protocol by checking for the presence of those dunder methods, and +therefore accept any Arrow data (instead of harcoding support for a specific +Arrow producer such as PyArrow). + .. _arrow_array_protocol: Controlling conversion to pyarrow.Array with the ``__arrow_array__`` protocol From ae576697a77eed041c52b7f7cf613dbf9339f943 Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Tue, 12 Dec 2023 13:53:57 +0100 Subject: [PATCH 2/4] typo --- docs/source/python/extending_types.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/python/extending_types.rst b/docs/source/python/extending_types.rst index c2649685caa..b05be05a3f1 100644 --- a/docs/source/python/extending_types.rst +++ b/docs/source/python/extending_types.rst @@ -30,7 +30,7 @@ specific to Python, but for Python libraries this interface is extended with a P specific layer: :ref:`arrow-pycapsule-interface`. This Python interface ensures that different libraries that support the C Data interface -can recognize each other objects and export Arrow data structures in a standard way. +can export Arrow data structures in a standard way and recognize each other's objects. If you have a library providing data structures that hold Arrow-compatible data under the hood, you can implement the following dunder methods on those objects: From 403e433bdfdbb5eebd452a9dbd83d61f030bd1db Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Tue, 12 Dec 2023 13:55:28 +0100 Subject: [PATCH 3/4] Apply suggestions from code review Co-authored-by: Antoine Pitrou --- docs/source/python/extending_types.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/python/extending_types.rst b/docs/source/python/extending_types.rst index b05be05a3f1..0d16023dc46 100644 --- a/docs/source/python/extending_types.rst +++ b/docs/source/python/extending_types.rst @@ -33,7 +33,7 @@ This Python interface ensures that different libraries that support the C Data i can export Arrow data structures in a standard way and recognize each other's objects. If you have a library providing data structures that hold Arrow-compatible data -under the hood, you can implement the following dunder methods on those objects: +under the hood, you can implement the following methods on those objects: - ``__arrow_c_schema__`` for schema or type-like objects. - ``__arrow_c_array__`` for arrays and record batches (contiguous tables). @@ -43,13 +43,13 @@ Those methods return `PyCapsule `_ objects, and more details on the exact semantics can be found in the :ref:`specification `. -When your data structures have those dunder methods defined, the pyarrow constructors +When your data structures have those methods defined, the PyArrow constructors (such as :func:`pyarrow.array` or :func:`pyarrow.table`) will recognize those objects as supporting this protocol, and convert them to PyArrow data structures zero-copy. And the same can be true for any other library supporting this protocol on ingesting data. Similarly, if your library has functions that accept user-provided data, you can add -support for this protocol by checking for the presence of those dunder methods, and +support for this protocol by checking for the presence of those methods, and therefore accept any Arrow data (instead of harcoding support for a specific Arrow producer such as PyArrow). From ebd189ff9c099dbc5b307be1f0c5a75f37cd7d82 Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Wed, 13 Dec 2023 13:23:56 +0100 Subject: [PATCH 4/4] add Python --- docs/source/python/extending_types.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/python/extending_types.rst b/docs/source/python/extending_types.rst index 0d16023dc46..b7261005e66 100644 --- a/docs/source/python/extending_types.rst +++ b/docs/source/python/extending_types.rst @@ -32,7 +32,7 @@ specific layer: :ref:`arrow-pycapsule-interface`. This Python interface ensures that different libraries that support the C Data interface can export Arrow data structures in a standard way and recognize each other's objects. -If you have a library providing data structures that hold Arrow-compatible data +If you have a Python library providing data structures that hold Arrow-compatible data under the hood, you can implement the following methods on those objects: - ``__arrow_c_schema__`` for schema or type-like objects.