diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 78447c04099..4e0c8c1142e 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -520,6 +520,7 @@ pep-0662.rst @brettcannon pep-0662/pep-0662-editables.json @brettcannon pep-0663.txt @ethanfurman pep-0664.rst @pablogsal +pep-0665.rst @brettcannon # pep-0666.txt # ... # pep-0754.txt diff --git a/pep-0665.rst b/pep-0665.rst new file mode 100644 index 00000000000..3d13ee1ef56 --- /dev/null +++ b/pep-0665.rst @@ -0,0 +1,781 @@ +PEP: 665 +Title: Specifying Installation Requirements for Python Projects +Author: Brett Cannon , + Pradyun Gedam , + Tzu-ping Chung +PEP-Delegate: +Discussions-To: https://discuss.python.org/c/packaging/14 +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 29-Jul-2021 +Post-History: 29-Jul-2021 +Resolution: + +======== +Abstract +======== + +This PEP specifies a file format to list the Python package +installation requirements for a project. The list of projects is +considered exhaustive for the installation target and thus +*locked down*, not requiring any information beyond the platform being +installed for and the *lock file* listing the required dependencies +to perform a successful installation of dependencies. + + +========== +Motivation +========== + +Thanks to PEP 621, projects have a way to list their direct/top-level +dependencies which they need to have installed. But PEP 621 also +(purposefully) omits two key details that often become important for +projects: + +#. A listing of all indirect/transitive dependencies +#. Specifying (at least) specific versions of dependencies for + reproducible installations + +Both needs are important for various reasons. One is that without a +complete listing of all dependencies and the specific versions to use, +there can be a skew between developers of the same project, or +developer and user, based on what versions of a project's dependencies +happen to be available at the time of installation. For instance, +a dependency may have v1 as the newest version on Monday when one +developer installed the dependency, while v2 comes out on Wednesday +when another developer installs the same dependency. Now the two +developers are working against two different versions of the same +dependency, which can lead to different outcomes. + +Another important reason for reproducible installations is for +security purposes. Guaranteeing that the same binary data is +downloaded and installed for all installations makes sure that no bad +actor has somehow changed a dependency's binary data in a malicious +way. A lock file can assist in this guarantee by recording the exact +details of what should be installed and how to verify that those +dependencies have not changed any bytes unexpectedly. + +The community itself has also shown a need for lock files based on the +fact that multiple tools have independently created their own lock +file formats: + +#. PDM_ +#. `pip-tools`_ +#. Pipenv_ +#. Poetry_ +#. Pyflow_ + +Other programming language communities have also shown the usefulness +of lock files by developing their own solution to this problem. Some +of those communities include: + +#. Dart_ +#. npm_/Node +#. Rust_ + + +========= +Rationale +========= + +To begin, two key terms should be defined. A **locker** is a tool +which *produces* a lock file. An **installer** is a tool which +*consumes* a lock file to install the appropriate dependencies. + + +----- +Goals +----- + +The file format should be *machine-readable*, *machine-writable*, and +*human-readable*. Since the assumption is the vast majority of lock +file will be generated by a locker tool, the format should be easy +to write by a locker. As install tools will be consuming the lock +file, the format also needs to be easily read by an installer. But the +format should also be readable by a person as people will inevitably +be performing audits on lock files. Having a format that does not lend +itself towards being read by people would hinder that. This includes +changes to a lock file being readable in a diff format for auditing +changes. It also means that understanding *why* something is in +the lock file should be comprehensible in a diff to assist in auditing +changes. + +The lock file format needs to be general enough to support +*cross-platform and cross-environment* specifications of dependencies. +This allows having a single lock file which can work on a myriad of +platforms and environments when that makes sense. This has been shown +as a necessary feature by the various tools in the Python packaging +ecosystem which already have a lock file format (e.g. Pipenv_, +Poetry_, PDM_). + +The lock file also needs to support *reproducible installations*. If +one wants to restrict what the lock file covers to a single platform +to guarantee the exact dependencies and files which will be installed, +that should be doable. This can be critical in security contexts for +projects like SecureDrop_. + +When a computation could be performed either in the locker or +installer, the preference is to *perform the computation in the +locker*. This is because the assumption is a locker will be executed +less frequently than an installer. + +The installer should be able to resolve what to install based entirely +on platform/environment information and what is contained within the +lock file. There should be +*no need to use network or other file system I/O* in order to resolve +what to install. + +The lock file should provide enough flexibility to allow lockers and +installers to innovate. While the lock file specification provides a +*common denominator of functionality*, it should not act as a ceiling +for functionality. + + +--------- +Non-Goals +--------- + +Because of the expected size of lock files, no effort was put into +making lock files *human-writable*. + + +============= +Specification +============= + +------- +Details +------- + +Lock files MUST use the TOML_ file format thanks to its adoption by +PEP 518 for ``pyproject.toml``. This not only prevents the need to +have another file format in the Python packaging ecosystem, but it +also assists in making lock files human-readable. + +Lock files MUST be kept in a directory named ``pyproject-lock.d``. +Lock files MUST end with a ``.toml`` file extension. Projects may have +as many lock files as they want using whatever file name stems they +choose. This PEP prescribes no specific way to automatically select +between multiple lock files and installers SHOULD avoid guessing which +lock file is "best-fitting" (this does not preclude situations where +only a single lock file with a certain name is expected to exist and +will be used by default, e.g. a documentation hosting site always +using a lock file named ``pyproject-lock.d/rftd.toml`` when provided). + +The following are the top-level keys of the TOML file data format. + + +``version`` +=========== + +The version of the lock file being used. The key MUST be specified and +it MUST be set to ``1``. The number MUST always be an integer and it +MUST only increment in future updates to the specification. What +consistitutes a version number increase is left to future PEPs or +standards changes. + + +``[tool]`` +========== + +Tools may create their own sub-tables under the ``tool`` table. The +rules for this table match those for ``pyproject.toml`` and its +``[tool]`` table from the `build system declaration spec`_. + + +``[metadata]`` +============== + +A table containing data applying to the overall lock file. + + +``metadata.marker`` +------------------- + +An optional key storing a string containing an environment marker as +specified in the `dependency specifier spec`_. + + +The locker MAY specify an environment marker which specifies any +restrictions the lock file was generated under (e.g. specific Python +versions supported). + +If the installer is installing for an environment which does not +satisfy the specified environment marker, the installer MUST raise an +error as the lock file does not support the environment. + + +``metadata.tags`` +----------------- + +An optional array of inline tables representing +`platform compatibility tags`_ that the lock file supports. The locker +MAY specify tables in the array which represent the compatibility the +lock file was generated for. + +The tables have the possible keys of: + +- ``interpreter`` +- ``abi`` +- ``platform`` + +representing the parts of the platform compatibility tags. Each key is +optional in a table. These keys MUST represent a single value, i.e. +the values are exploded and not compressed in wheel tag parlance. + +If the environment an installer is installing for does not match +**any** table in the array (missing keys in the table means implicit +support for that part of the compatibility), the installer MUST raise +an error as the lock file does not support the environment. + + +``metadata.needs`` +------------------ + +An array of strings representing the package specifiers for the +top-level/direct dependencies of the lock file as defined by the +`dependency specifier spec`_ (i.e. the root of the dependency graph +for the lock file). + +Lockers MUST only allow specifiers which may be satisfiable by the +lock file and the dependency graph the lock file encodes. Lockers MUST +normalize project names according to the `simple repository API`_. + + +``[package]`` +=============== + +A table containing arrays of tables for each dependency recorded +in the lock file. + +Each key of the table is the name of a package which MUST be +normalized according to the `simple repository API`_. If extras are +specified as part of the project to install, the extras are to be +included in the key name and are to be sorted in lexicographic order. + +Within the file, the tables for the projects MUST be +sorted by: + +#. Project/key name in lexicographic order +#. Package version, newest/highest to older/lowest according to the + `version specifiers spec`_ +#. Extras via lexicographic order + + +``package..version`` +-------------------------- + +A required string of the version of the package as specified by the +`version specifiers spec`_. + + +``package..needs`` +------------------------ + +An optional key containing an array of strings following the +`dependency specifier spec`_ which specify what other packages this +package depends on. See ``metadata.needs`` for full details. + + +``package..required-by`` +------------------------------ + +A key containing an array of package names which depend on this +package. The package names MUST match the package name as used in the +``package`` table. + +The lack of a ``required-by`` key infers that the package is a +top-level package listed in ``metadata.needs``. + + +``package..code`` +----------------------- + +An array of tables listing files that are available to satisfy +the installation of the package for the specified version in the +``version`` key. + +Each table has a ``type`` key which specifies how the code is stored. +All other keys in the table are dependent on the value set for +``type``. The acceptable values for ``type`` are listed below; all +other possible values are reserved for future use. + +Tables in the array MUST be sorted in lexicographic order of the value +of ``type``, then lexicographic order for the value of ``url``. + +When recording a table, the fields SHOULD be listed in the order +the fields are listed in this specification for consistency to make +diffs of a lock file easier to read. + +For all types other than "wheel", an INSTALLER MAY refuse to install +code to avoid arbitrary code execution during installation. + +An installer MUST verify the hash of any specified file. + + +``type="wheel"`` +'''''''''''''''' + +A `wheel file`_ for the package version. + +Supported keys in the table are: + +- ``url``: a string of location of the wheel file (use the + ``file://`` protocol for the local file system) +- ``hash-algorithm``: a string of the algorithm used to generate the + hash value stored in ``hash-value`` +- ``hash-value``: a string of the hash of the file contents +- ``interpreter-tag``: (optional) a string of the interpreter portion + of the wheel tag as specified by the `platform compatibility tags`_ + spec +- ``abi-tag``: (optional) a string of the ABI portion of the wheel tag + as specified by the `platform compatibility tags`_ spec +- ``platform-tag``: (optional) a string of the platform portion of the + wheel tag as specified by the `platform compatibility tags`_ spec + +If the keys related to `platform compatibility tags`_ are absent then +the installer MUST infer the tags from the URL's file name. If any of +the `platform compatibility tags`_ are specified by a key in the table +then a locker MUST provide all three related keys. The values of the +keys may be compressed tags. + + +``type="sdist"`` +'''''''''''''''' + +A `source distribution file`_ (sdist) for the package version. + +- ``url``: a string of location of the sdist file (use the + ``file://`` protocol for the local file system) +- ``hash-algorithm``: a string of the algorithm used to generate the + hash value stored in ``hash-value`` +- ``hash-value``: a string of the hash of the file contents + + +``type="git"`` +'''''''''''''' + +A Git_ version control repository for the package. + +- ``url``: a string of location of the repository (use the + ``file://`` protocol for the local file system) +- ``commit``: a string of the commit of the repository which + represents the version of the package + +The repository MUST follow the `source distribution file`_ spec +for source trees, otherwise an error is to be raised by the locker. + +As the commit ID for a Git repository is a hash of the repository's +contents, there is no hash to verify. + + +``type="source tree"`` +'''''''''''''''''''''' + +A source tree which can be used to build a wheel. + +- ``url``: a string of location of the source tree (use the + ``file://`` protocol for the local file system) +- ``mime-type``: (optional) a string representing the MIME type of the + URL +- ``hash-algorithm``: (optional for a local directory) a string of the + algorithm used to generate the hash value stored in ``hash-value`` +- ``hash-value``: (optional for a local directory) a string of the + hash of the file contents + +The collection of files MUST follow the `source distribution file`_ +spec for source trees, otherwise an error is to be raised by the +locker. + +Installers MAY use the file extension, MIME type from HTTP headers, +etc. to infer whether they support the storage mechanism used for the +source tree. If the MIME type cannot be inferred and it is not +specified via ``mime-type`` then an error MUST be raised. + +If the source tree is NOT a local directory, then an installer MUST +verify the hash value. Otherwise if the source tree is a local +directory then the ``hash-algorithm`` and ``hash-value`` keys MUST be +left out. The installer MAY warn the user of the use of a local +directory due to the potential change in code since the lock file +was created. + + +------- +Example +------- + +:: + + version = 1 + + [tool] + # Tool-specific table ala PEP 518's `[tool]` table. + + [metadata] + marker = "python_version>='3.6'" + + needs = ["mousebender"] + + [[package.attrs]] + version = "21.2.0" + required-by = ["mousebender"] + + [[package.attrs.code]] + type = "wheel" + url = "https://files.pythonhosted.org/packages/20/a9/ba6f1cd1a1517ff022b35acd6a7e4246371dfab08b8e42b829b6d07913cc/attrs-21.2.0-py2.py3-none-any.whl" + hash-algorithm="sha256" + hash-value = "149e90d6d8ac20db7a955ad60cf0e6881a3f20d37096140088356da6c716b0b1" + + [[package.mousebender]] + version = "2.0.0" + needs = ["attrs>=19.3", "packaging>=20.3"] + + [[package.mousebender.code]] + type = "sdist" + url = "https://files.pythonhosted.org/packages/35/bc/db77f8ca1ccf85f5c3324e4f62fc74bf6f6c098da11d7c30ef6d0f43e859/mousebender-2.0.0.tar.gz" + hash-algorithm = "sha256" + hash-value = "c5953026378e5dcc7090596dfcbf73aa5a9786842357273b1df974ebd79bd760" + + [[package.mousebender.code]] + type = "wheel" + url = "https://files.pythonhosted.org/packages/f4/b3/f6fdbff6395e9b77b5619160180489410fb2f42f41272994353e7ecf5bdf/mousebender-2.0.0-py3-none-any.whl" + hash-algorithm = "sha256" + hash-value = "a6f9adfbd17bfb0e6bb5de9a27083e01dfb86ed9c3861e04143d9fd6db373f7c" + + [[package.packaging]] + version = "20.9" + needs = ["pyparsing>=2.0.2"] + required-by = ["mousebender"] + + [[package.packaging.code]] + type = "git" + url = "https://github.com/pypa/packaging.git" + commit = "53fd698b1620aca027324001bf53c8ffda0c17d1" + + [[package.pyparsing]] + version = "2.4.7" + required-by = ["packaging"] + + [[package.pyparsing.code]] + type="wheel" + url = "https://files.pythonhosted.org/packages/8a/bb/488841f56197b13700afd5658fc279a2025a39e22449b7cf29864669b15d/pyparsing-2.4.7-py2.py3-none-any.whl" + hash-algorithm="sha256" + hash-value="ef9d7589ef3c200abe66653d3f1ab1033c3c419ae9b9bdb1240a85b024efc88b" + interpreter-tag = "py2.py3" + abi-tag = "none" + platform-tag = "any" + + +---------------------- +Installer Expectations +---------------------- + +Installers MUST implement the +`direct URL origin of installed distributions spec`_ as all packages +installed from a lock file inherently originate from a URL and not a +search of an index by package name and version. + + +Example Flow +============ + +#. Have the user specify which lock file they would like to use in + ``pyproject-lock.d`` (e.g. ``dev``, ``prod``) + +#. Check if the environment supports what is specified in + ``metadata.tags``; error out if it doesn't + +#. Check if the environment supports what is specified in + ``metadata.marker``; error out if it doesn't + +#. Gather the list of package names from ``metadata.needs``, and for + each listed package ... + + #. Resolve any markers to find the appropriate package to install + #. Find the most appropriate code to install for the package + #. Repeat the above steps for packages listed in the ``needs`` key + for each package found to install + +#. For each project collected to install ... + + #. Gather the specified code for the package + #. Verify hashes of code + #. Install the packages (if necessary) + + +======================= +Backwards Compatibility +======================= + +As there is no pre-existing specification regarding lock files, there +are no explicit backwards compatibility concerns. + +As for pre-existing tools that have their own lock file, some updating +will be required. Most document the lock file name, but not its +contents, in which case the file name of the lock file(s) is the +important part. For projects which do not commit their lock file to +version control, they will need to update the equivalent of their +``.gitignore`` file. For projects that do commit their lock file to +version control, what file(s) get committed will need an update. + +For projects which do document their lock file format like pipenv_, +they will very likely need a new major version release. + +Specifically for Poetry_, it has an +`export command `_ which +should allow Poetry to support this lock file format even if the +project chose not to adopt this PEP as Poetry's primary lock file +format. + + +===================== +Security Implications +===================== + +A lock file should not introduce security issues but instead help +solve them. By requiring the recording of hashes of code, a lock file +is able to help prevent tampering with code since the hash details +were recorded. A lock file also helps prevent unexpected package +updates being installed which may be malicious. + + +================= +How to Teach This +================= + +Teaching of this PEP will very much be dependent on the lockers and +installers being used for day-to-day use. Conceptually, though, users +could be taught that the ``pyproject-lock.d`` directory contains files +which specify what should be installed for a project to work. The +benefits of consistency and security should be emphasized to help +users realize why they should care about lock files. + + +======================== +Reference Implementation +======================== + +No proof-of-concept or reference implementation currently exists. + + +============== +Rejected Ideas +============== + +---------------------------- +File Formats Other Than TOML +---------------------------- + +JSON_ was briefly considered, but due to: + +#. TOML already being used for ``pyproject.toml`` +#. TOML being more human-readable +#. TOML leading to better diffs + +the decision was made to go with TOML. There was some concern over +Python's standard library lacking a TOML parser, but most packaging +tools already use a TOML parser thanks to ``pyproject.toml`` so this +issue did not seem to be a showstopper. Some have also argued against +this concern in the past by the fact that if packaging tools abhor +installing dependencies and feel they can't vendor a package then the +packaging ecosystem has much bigger issues to rectify than needing to +depend on a third-party TOML parser. + + +---------------------------------------- +Alternative Name to ``pyproject-lock.d`` +---------------------------------------- + +The name ``__lockfile__`` was briefly considered, but the directory +would not sort next to ``pyproject.toml`` in instances where files +and directories were sorted together in lexicographic order. The +current naming is also more obvious in terms of its relationship +to ``pyproject.toml``. + + +----------------------------- +Supporting a Single Lock File +----------------------------- + +At one point the idea of not using a directory of lock files but a +single lock file which contained all possible lock information was +considered. But it quickly became apparent that trying to devise a +data format which could encompass both a lock file format which could +support multiple environments as well as strict lock outcomes for +reproducible builds would become quite complex and cumbersome. + +The idea of supporting a directory of lock files as well as a single +lock file named ``pyproject-lock.toml`` was also considered. But any +possible simplicity from skipping the directory in the case of a +single lock file seemed unnecessary. Trying to define appropriate +logic for what should be the ``pyproject-lock.toml`` file and what +should go into ``pyproject-lock.d`` seemed unnecessarily complicated. + + +----------------------------------------------- +Using a Flat List Instead of a Dependency Graph +----------------------------------------------- + +The first version of this PEP proposed that the lock file have no +concept of a dependency graph. Instead, the lock file would list +exactly what should be installed for a specific platform such that +installers did not have to make any decisions about *what* to install, +only validating that the lock file would work for the target platform. + +This idea was eventually rejected due to the number of combinations +of potential PEP 508 environment markers. The decision was made that +trying to have lockers generate all possible combinations when a +project wants to be cross-platform would be too much. + + +------------------------------------------------------------------------- +Being Concerned About Different Dependencies Per Wheel File For a Project +------------------------------------------------------------------------- + +It is technically possible for a project to specify different +dependencies between its various wheel files. Taking that into +consideration would then require the lock file to operate not +per-project but per-file. Luckily, specifying different dependencies +in this way is very rare and frowned upon and so it was deemed not +worth supporting. + + +------------------------------- +Use Wheel Tags in the File Name +------------------------------- + +Instead of having the ``metadata.tags`` field there was a suggestion +of encoding the tags into the file name. But due to the addition of +the ``metadata.marker`` field and what to do when no tags were needed, +the idea was dropped. + + +----------------------------------------- +Using Semantic Versioning for ``version`` +----------------------------------------- + +Instead of a monotonically increasing integer, using a float was +considered to attempt to convey semantic versioning. In the end, +though, it was deemed more hassle than it was worth as adding a new +key would likely constitute a "major" version change (only if the +key was entirely optional would it be considered "minor"), and +experience with the `core metadata spec`_ suggests there's a bigger +chance parsing will be relaxed and made more strict which is also a +"major" change. As such, the simplicity of using an integer made +sense. + + +------------------------------- +Alternative Names for ``needs`` +------------------------------- + +Some other names for what became ``needs`` were ``installs`` and +``dependencies``. In the end a Python beginner was asked which term +they preferred and they found ``needs`` clearer. Since there wasn't +any reason to disagree with that, the decision was to go with +``needs``. + + +------------------------------------- +Alternative Names for ``required-by`` +------------------------------------- + +Other names that were considered were ``dependents``, ``depended-by``, +and ``supports``. In the end, ``required-by`` simply seemed like the +best fit. + + +------------------------------------- +Support for Branches and Tags for Git +------------------------------------- + +Due to the `direct URL origin of installed distributions spec`_ +supporting the specification of branches and tags, it was suggested +that lock files support the same thing. But because branches and tags +can change what commit they point to between locking and installation, +that was viewed as a security concern (Git commit IDs are hashes of +metadata and thus are viewed as immutable). + + +=========== +Open Issues +=========== + +--------------------------------------- +Allow for Tool-Specific ``type`` Values +--------------------------------------- + +It has been suggested to allow for custom ``type`` values in the +``code`` table. They would be prefixed with ``x-`` and followed by +the tool's name and then the type, i.e. ``x--``. This +would provide enough flexibility for things such as other version +control systems, innovative container formats, etc. to be officially +usable in a lock file. + +----------------------------------------------- +Support Variable Expansion in the ``url`` field +----------------------------------------------- + +This could include predefined variables like ``PROJECT_ROOT`` for the +directory containing ``pyproject-lock.d`` so URLs to local directories +and files could be relative to the project itself. + +Environment variables could be supported to avoid hardcoding things +such as user credentials for Git. + + +=============== +Acknowledgments +=============== + +Thanks to Frost Ming of PDM_ and Sébastien Eustace of Poetry_ for +providing input around dynamic install-time resolution of PEP 508 +requirements. + +Thanks to Kushal Das for making sure reproducible builds stayed a +concern for this PEP. + +Thanks to Andrea McInnes for settling the bikeshedding and choosing +the paint colour of ``needs``. + + +========= +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. + + +.. _build system declaration spec: https://packaging.python.org/specifications/declaring-build-dependencies/ +.. _core metadata spec: https://packaging.python.org/specifications/core-metadata/ +.. _Dart: https://dart.dev/ +.. _dependency specifier spec: https://packaging.python.org/specifications/dependency-specifiers/ +.. _Git: https://git-scm.com/ +.. _JSON: https://www.json.org/ +.. _npm: https://www.npmjs.com/ +.. _PDM: https://pypi.org/project/pdm/ +.. _pip-tools: https://pypi.org/project/pip-tools/ +.. _Pipenv: https://pypi.org/project/pipenv/ +.. _platform compatibility tags: https://packaging.python.org/specifications/platform-compatibility-tags/ +.. _Poetry: https://pypi.org/project/poetry/ +.. _Pyflow: https://pypi.org/project/pyflow/ +.. _direct URL origin of installed distributions spec: https://packaging.python.org/specifications/direct-url/ +.. _Rust: https://www.rust-lang.org/ +.. _SecureDrop: https://securedrop.org/ +.. _simple repository API: https://packaging.python.org/specifications/simple-repository-api/ +.. _source distribution file: https://packaging.python.org/specifications/source-distribution-format/ +.. _TOML: https://toml.io +.. _version specifiers spec: https://packaging.python.org/specifications/version-specifiers/ +.. _wheel file: https://packaging.python.org/specifications/binary-distribution-format/ + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: