diff --git a/pep-0670.rst b/pep-0670.rst index 414d0247ba3..f7a3a5715d7 100644 --- a/pep-0670.rst +++ b/pep-0670.rst @@ -60,13 +60,13 @@ The `GCC documentation `_ lists several common macro pitfalls: -- Misnesting -- Operator precedence problems -- Swallowing the semicolon -- Duplication of side effects -- Self-referential macros -- Argument prescan -- Newlines in arguments +- Misnesting; +- Operator precedence problems; +- Swallowing the semicolon; +- Duplication of side effects; +- Self-referential macros; +- Argument prescan; +- Newlines in arguments. Performance and inlining @@ -77,19 +77,39 @@ compilers have efficient heuristics to decide if a function should be inlined or not. When a C compiler decides to not inline, there is likely a good reason. -For example, inlining would reuse a register which require to -save/restore the register value on the stack and so increase the stack -memory usage or be less efficient. +For example, inlining would reuse a register which requires to +save/restore the register value on the stack and so increases the stack +memory usage, or be less efficient. Debug build ----------- -When Python is built in debug mode, most compiler optimizations are -disabled. For example, Visual Studio disables inlining. Benchmarks must -not be run on a Python debug build, only on release build: using LTO and -PGO is recommended for reliable benchmarks. PGO helps the compiler to -decide if function should be inlined or not. +Benchmarks must not be run on a Python debug build, only on release +build. Moreover, using LTO and PGO optimizations is recommended for best +performances and reliable benchmarks. PGO helps the compiler to decide +if function should be inlined or not. + +``./configure --with-pydebug`` uses the ``-Og`` compiler option if it's +supported by the compiler (GCC and LLVM clang support it): optimize +debugging experience. Otherwise, the ``-O0`` compiler option is used: +disable most optimizations. + +With GCC 11, ``gcc -Og`` can inline static inline functions, whereas +``gcc -O0`` does not inline static inline functions. Examples: + +* Call ``Py_INCREF()`` in ``PyBool_FromLong()``: + + * ``gcc -Og``: inlined + * ``gcc -O0``: not inlined, call ``Py_INCREF()`` function + +* Call ``_PyErr_Occurred()`` in ``_Py_CheckFunctionResult()``: + + * ``gcc -Og``: inlined + * ``gcc -O0``: not inlined, call ``_PyErr_Occurred()`` function + +On Windows, when Python is built in debug mode by Visual Studio, static +inline functions are not inlined. Force inlining @@ -154,6 +174,11 @@ functions should be measured with benchmarks. If there is a significant slowdown, there should be a good reason to do the conversion. One reason can be hiding implementation details. +To avoid any risk of performance slowdown on Python built without LTO, +it is possible to keep a private static inline function in the internal +C API and use it in Python, but expose a regular function in the public +C API. + Using static inline functions in the internal C API is fine: the internal C API exposes implementation details by design and should not be used outside Python. @@ -164,8 +189,8 @@ Cast to PyObject* When a macro is converted to a function and the macro casts its arguments to ``PyObject*``, the new function comes with a new macro which cast arguments to ``PyObject*`` to prevent emitting new compiler -warnings. So the converted functions still accept pointers to structures -inheriting from ``PyObject`` (ex: ``PyTupleObject``). +warnings. So the converted functions still accept pointers to other +structures inheriting from ``PyObject`` (ex: ``PyTupleObject``). For example, the ``Py_TYPE(obj)`` macro casts its ``obj`` argument to ``PyObject*``:: @@ -224,9 +249,47 @@ the macro. People using macros should be considered "consenting adults". People who feel unsafe with macros should simply not use them. +The idea was rejected because macros are error prone and it is too easy +to miss a macro pitfall when writing a macro. Moreover, macros are +harder to read and to maintain than functions. + + Examples of hard to read macros =============================== +PyObject_INIT() +--------------- + +Example showing the usage of commas in a macro which has a return value. + +Python 3.7 macro:: + + #define PyObject_INIT(op, typeobj) \ + ( Py_TYPE(op) = (typeobj), _Py_NewReference((PyObject *)(op)), (op) ) + +Python 3.8 function (simplified code):: + + static inline PyObject* + _PyObject_INIT(PyObject *op, PyTypeObject *typeobj) + { + Py_TYPE(op) = typeobj; + _Py_NewReference(op); + return op; + } + + #define PyObject_INIT(op, typeobj) \ + _PyObject_INIT(_PyObject_CAST(op), (typeobj)) + +* The function doesn't need the line continuation character ``"\"``. +* It has an explicit ``"return op;"`` rather than the surprising + ``", (op)"`` syntax at the end of the macro. +* It uses short statements on multiple lines, rather than being written + as a single long line. +* Inside the function, the *op* argument has the well defined type + ``PyObject*`` and so doesn't need casts like ``(PyObject *)(op)``. +* Arguments don't need to be put inside parenthesis: use ``typeobj``, + rather than ``(typeobj)``. + _Py_NewReference() ------------------ @@ -254,35 +317,6 @@ Python 3.8 function (simplified code):: Py_REFCNT(op) = 1; } -PyObject_INIT() ---------------- - -Example showing the usage of commas in a macro. - -Python 3.7 macro:: - - #define PyObject_INIT(op, typeobj) \ - ( Py_TYPE(op) = (typeobj), _Py_NewReference((PyObject *)(op)), (op) ) - -Python 3.8 function (simplified code):: - - static inline PyObject* - _PyObject_INIT(PyObject *op, PyTypeObject *typeobj) - { - Py_TYPE(op) = typeobj; - _Py_NewReference(op); - return op; - } - - #define PyObject_INIT(op, typeobj) \ - _PyObject_INIT(_PyObject_CAST(op), (typeobj)) - -The function doesn't need the line continuation character. It has an -explicit ``"return op;"`` rather than a surprising ``", (op)"`` at the -end of the macro. It uses one short statement per line, rather than a -single long line. Inside the function, the *op* argument has a well -defined type: ``PyObject*``. - Macros converted to functions since Python 3.8 ============================================== @@ -346,6 +380,52 @@ private static inline function has been added to the internal C API: * ``_PyVectorcall_FunctionInline()`` +Benchmarks +========== + +Benchmarks run on Fedora 35 (Linux) with GCC 11 on a laptop with 8 +logical CPUs (4 physical CPU cores). + + +gcc -O0 versus gcc -Og +---------------------- + +Benchmark of the ``./python -m test -j10`` command on a Python debug +build: + +* ``gcc -Og``: 220 sec ± 3 sec +* ``gcc -O0``: 360 sec ± 6 sec + +Python built with ``gcc -O0`` is **1.6x slower** than Python built with +``gcc -Og``. + +Replace macros with static inline functions +------------------------------------------- + +The `PR 29728 `_ replaces +existing the following static inline functions with macros: + +* ``PyObject_TypeCheck()`` +* ``PyType_Check()``, ``PyType_CheckExact()`` +* ``PyType_HasFeature()`` +* ``PyVectorcall_NARGS()`` +* ``Py_DECREF()``, ``Py_XDECREF()`` +* ``Py_INCREF()``, ``Py_XINCREF()`` +* ``Py_IS_TYPE()`` +* ``Py_NewRef()`` +* ``Py_REFCNT()``, ``Py_TYPE()``, ``Py_SIZE()`` + +Benchmark of the ``./python -m test -j10`` command on a Python debug +build: + +* Macros (PR 29728), ``gcc -O0``: 345 sec ± 5 sec +* Static inline functions (reference), ``gcc -O0``: 360 sec ± 6 sec + +Replacing macros with static inline functions makes Python +**1.04x slower** when the compiler **does not inline** static inline +functions. + + References ==========