Curated PyPI Sources
Enhanced Advanced
This section provides an overview of what curated-pypi
sources are, why they are useful, and how to use them. If you'd just like to get started, reference the Quick Start section on how to create your first curated subset of PyPI.
Overview#
Curated PyPI sources are based on a current mirror of PyPI. To learn more about PyPI, visit the PyPI Mirror documentation to understand how the Posit Package Manager PyPI source functions. Since PyPI has over 400,000 packages, it can be useful to only include certain packages and versions within a source. This is especially helpful in secure, regulated environments where only verified sets of packages are allowed.
Creating a Curated PyPI Source#
$ rspm create source --name=pypi-subset --type=curated-pypi
<< Source 'pypi-subset':
<< Type: Curated PyPI
Curated PyPI sources don't need to be pinned to a specific snapshot date at the time of creation; any date can be picked when adding packages with rspm update
(described below). Once the source has been created, be sure to subscribe a repository to the source to make the packages available to users:
# Create a repository:
$ rspm create repo --name=pypi --type=python --description='Access Curated PyPI packages'
# Subscribe a repository to the curated-pypi source:
$ rspm subscribe --repo=pypi --source=pypi-subset
Including Packages in a Curated PyPI Source#
Packages are included in a curated-pypi
source by uploading a requirements.txt
definition with rspm update
. This section explains at how a requirements file is defined and also discusses how to use a requirements file to include packages in a curated-pypi
source.
Requirements Files#
A requirements.txt
can be created from scratch, or you can use a pre-existing file that an organization or team already uses to define local environments. The requirements.txt
format that Package Manager looks at is defined as:
As an example, a requirements.txt
file could look like:
This fetches and installs:
- All available versions of
shiny
. - All versions of
tensorflow
greater than or equal to2.4.0
, less than2.5
, and explictly not include2.4.2
. - Only
numpy
version1.24.2
. - All packages from
requirements2.txt
.
As shown in the example above, a package doesn't need to have any version constraints defined. It can also have as many version constraints as needed. The versions made available to Package Manager will depend on what is available at the snapshot date specified when updating the source.
All Python version parsing and matching criteria is based on PEP-440. Refer to the PEP-440 documentation for information on version formatting and constraints. For more information on the Requirements File Format, refer to pip's documentation.
Note
Not everything defined in the Requirements File Format specification is supported in Package Manager. The curated-pypi
source only parses package names, version ranges, and recursive file references. Any other definitions (e.g., extras, option flags, environment markers) within an uploaded requirements.txt
file is ignored.
The requirements.txt
file also supports declaring multiple references of the same package with different version constraints:
This will be treated as an OR
operator, leading the curated-pypi
source to evaluate the defined version constraints as "tensorflow == 2.4.2 or tensorflow == 2.4.3".
In this example, Package Manager will pull in both version 2.4.2
and version 2.4.3
. This can be helpful when combining requirements.txt
files from multiple sources, ensuring all versions you are expecting to be included.
Note
Be careful when referencing a package multiple times when using a !=
constraint. As an example:
This will still include version 2.4.2
because it is being evaluated as "tensorflow >= 2.0.0, < 3.0.0 OR tensorflow != 2.4.2"
To guarantee that version 2.4.2
is excluded, include all version constraints on a single line so Package Manager evaluates all constraints together:
Generating requirements.txt files
Need to generate a requirements.txt file? See Generating requirements.txt.
Using Pipfiles Instead#
If you already have a Pipfile
or Pipfile.lock
defined, then you may prefer to use that. Although Package Manager doesn't support uploading the Pipfile
directly, there are a few methods to convert them to the requirements.txt
format.
One method is to run pip freeze
from within the defined pipenv
environment:
Another alternative could be to use jq
to parse the Pipfile.lock
file and turn it into a requirements.txt
file:
These methods should be useful to get your package specifications into a format that Package Manager can handle.
Updating a Curated PyPI Source#
To make packages available in a Curated PyPI source, all that is necessary is to run rspm update
with a requirements file for a specific PyPI snapshot date. Package Manager allows running a dry-run before committing the changes to the source:
# Do a dry-run to visualize the changes to the source before doing them
$ rspm update --source=pypi-subset --file-in=/path/to/requirements.txt --snapshot=2023-03-24
A preview of the changes is presented:
Packages from 'requirements.txt' to update source 'pypi-subset' at PyPI snapshot date '2023-03-24':
Name Version
numpy 1.24.2
shiny 0.1, 0.2.1, 0.2.2, 0.2.3, 0.2.4, 0.2.5, 0.2.6, 0.2.7, 0.2.8, 0.2.9, 0.2.10
tensorflow 2.4.0, 2.4.1, 2.4.3, 2.4.4
If the output above looks correct, execute this command again with the --commit flag to update the source with the new set of packages.
Note
If your requirements.txt
file includes more than 1,000 packages, the output of the update
command is simplified for performance purposes.
To commit the changes, repeat the command, adding the --commit
flag:
# Now commit the changes to the source:
$ rspm update --source=pypi-subset --file-in=/path/to/requirements.txt --snapshot=2023-03-24 --commit
The finalized contents of the source are then printed:
Successfully updated source 'pypi-subset' at PyPI snapshot date '2023-03-24' with the following packages from 'requirements.txt':
Name Version
numpy 1.24.2
shiny 0.1, 0.2.1, 0.2.2, 0.2.3, 0.2.4, 0.2.5, 0.2.6, 0.2.7, 0.2.8, 0.2.9, 0.2.10
tensorflow 2.4.0, 2.4.1, 2.4.3, 2.4.4
Note
Running rspm update
on a Curated PyPI source overwrites the source with only the packages defined in your requirements.txt
file. However, previous snapshots of the source are still available with a pinned repo URL.
To update the source to a different snapshot date, use the update
command again:
# Update packages in a curated-pypi source:
$ rspm update --source=pypi-subset --file-in=/path/to/requirements.txt --snapshot=2021-02-03 --commit
Curated PyPI sources can be pinned to any date for which Posit has a PyPI snapshot (typically, once per weekday). Curated PyPI sources also support using any date, regardless of the previously used snapshot dates. If the source was initially set to 2021-02-03
, it can then be set to a later date with --snapshot=2022-06-01
. If later you would like to pin it back to the original date used, that can be done by running rspm update
again with --snapshot=2021-02-03
.
Tip
This allows you to set the Curated PyPI source to any date where a PyPI snapshot has been taken on our servers. If you are trying to pin to a version of a package that doesn't exist on PyPI anymore, try pinning to a date when it existed.