Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/format/CDataInterface/PyCapsuleInterface.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@
.. under the License.


.. _arrow-pycapsule-interface:

=============================
The Arrow PyCapsule Interface
=============================
Expand Down
32 changes: 32 additions & 0 deletions docs/source/python/extending_types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,38 @@
Extending pyarrow
=================

Controlling conversion to (Py)Arrow with the PyCapsule Interface
----------------------------------------------------------------

The :ref:`Arrow C data interface <c-data-interface>` allows moving Arrow data between
different implementations of Arrow. This is a generic, cross-language interface not
specific to Python, but for Python libraries this interface is extended with a Python
specific layer: :ref:`arrow-pycapsule-interface`.

This Python interface ensures that different libraries that support the C Data interface
can export Arrow data structures in a standard way and recognize each other's objects.

If you have a Python library providing data structures that hold Arrow-compatible data
under the hood, you can implement the following methods on those objects:

- ``__arrow_c_schema__`` for schema or type-like objects.
- ``__arrow_c_array__`` for arrays and record batches (contiguous tables).
- ``__arrow_c_stream__`` for chunked tables or streams of data.

Those methods return `PyCapsule <https://docs.python.org/3/c-api/capsule.html>`__
objects, and more details on the exact semantics can be found in the
:ref:`specification <arrow-pycapsule-interface>`.

When your data structures have those methods defined, the PyArrow constructors
(such as :func:`pyarrow.array` or :func:`pyarrow.table`) will recognize those objects as
supporting this protocol, and convert them to PyArrow data structures zero-copy. And the
same can be true for any other library supporting this protocol on ingesting data.

Similarly, if your library has functions that accept user-provided data, you can add
support for this protocol by checking for the presence of those methods, and
therefore accept any Arrow data (instead of harcoding support for a specific
Arrow producer such as PyArrow).

.. _arrow_array_protocol:

Controlling conversion to pyarrow.Array with the ``__arrow_array__`` protocol
Expand Down