Air-Gapped Package Manager#
Advanced
Package Manager communicates with the Posit Package Service to access CRAN, Bioconductor and PyPI packages and metadata. In offline (air-gapped) environments, it is possible to directly download the necessary data from the online Posit Package Service and then copy it to an offline Package Manager server.
This guide walks through the steps of setting up the offline environment, performing regular updates, and upgrading Package Manager in offline environments.
Storage Requirements#
The amount of disk storage required to run Package Manager in an offline environment depends on the types of packages you have configured. A typical installation will require at least 200 GB of additional disk storage, but this can vary based on other factors specific to CRAN, Bioconductor, and PyPI.
CRAN#
CRAN requires at least 120 GB of storage space.
If CRAN binary package serving is enabled, the required storage will increase depending on the number of R versions and distributions in use.
Each R version and distribution combination requires an additional 200 GB of storage on average. The size varies between 100 to 300+ GB, with newer R versions and distributions tending to be smaller.
Info
For example, at least 800 GB of storage space is required to support R package binaries for R 4.2 and 4.3 on both RHEL 8 and RHEL 9.
Bioconductor#
Bioconductor requires up to 220 GB of storage per Bioconductor version. The total size of Bioconductor will be over 2.6 TB, so Package Manager allows a subset of Bioconductor versions to be used in offline environments.
PyPI#
PyPI requires more than 20 TB of storage for the entirety of PyPI. This is too large to download in most cases, so Package Manager allows a subset of PyPI packages to be used in offline environments.
The total size of the offline PyPI data will depend on your use of packages. Deep learning packages, such as Tensorflow and PyTorch, are notoriously large, with hundreds of gigabytes needed for each project's collection of files. If you do not anticipate using deep learning packages, a starting storage size of 50 GB is likely adequate. If you do intend to use deep learning packages, you should plan for 500 GB or more.
Initial Setup#
First, install the offline downloader on a system with outbound internet access to the Posit Package Service, https://rspm-sync.rstudio.com. The version of the offline downloader must match the version of your Package Manager server.
Review the commands to download offline package data for CRAN, Bioconductor, or PyPI:
rspm-offline-downloader get cran --help
rspm-offline-downloader get bioconductor --help
rspm-offline-downloader get pypi --help
Downloading the Data#
When ready, run the download commands with the appropriate flags to perform the full download. Each command supports a --dryrun
flag to describe what will be downloaded without saving any files.
If running more than one command, use the same --destination
path for each command.
The commands will download metadata and package files, and may take some time to complete. The --concurrency
flag (which defaults to 500
) may be adjusted to speed up the downloads, depending on network conditions. Faster connections may benefit from a higher download concurrency, while slower connections may benefit from a lower concurrency.
If a proxy is required to access the Posit Package Service, use the --outbound-proxy
flag to specify an outbound proxy server for downloading.
CRAN#
To download the minimum required offline CRAN data:
./rspm-offline-downloader get cran --rspm-version=2024.11.0 --destination=[/path/to/destination/]
If CRAN binary package serving is enabled, download the binary packages for just the R versions and distributions you need using the --include-binaries
, --r-versions
, and --distributions
flags.
For example, to download the binary packages for R 4.2 and 4.3, for RHEL 9, Ubuntu 22 (Jammy), and Windows:
./rspm-offline-downloader get cran --rspm-version=2024.11.0 --destination=[/path/to/destination/] \
--include-binaries --r-versions=4.2,4.3 --distributions=rhel9,jammy,windows
For macOS binary packages, you can optionally specify the architectures to download (x86_64 or arm64):
# Download macOS binary packages for arm64 only
./rspm-offline-downloader get cran --rspm-version=2024.11.0 --destination=[/path/to/destination/] \
--include-binaries --r-versions=4.2,4.3 --distributions=macos --architectures=arm64
# Download macOS binary packages for x86_64 only
./rspm-offline-downloader get cran --rspm-version=2024.11.0 --destination=[/path/to/destination/] \
--include-binaries --r-versions=4.2,4.3 --distributions=macos --architectures=x86_64
After running the download command, you can validate that the files were downloaded correctly using the rspm-offline-downloader validate cran
command:
./rspm-offline-downloader validate cran --rspm-version=2024.11.0 --destination=[/path/to/destination] --packages
Bioconductor#
By default, the get bioconductor
command downloads offline Bioconductor data for all versions of Bioconductor:
./rspm-offline-downloader get bioconductor --rspm-version=2024.11.0 --destination=[/path/to/destination/]
Since this is very large, we recommend downloading just the subset of Bioconductor versions in use with the --versions
flag. For example, to download offline data for just the current release and devel versions:
./rspm-offline-downloader get bioconductor --rspm-version=2024.11.0 --destination=[/path/to/destination/] \
--versions=release,devel
Or to download offline data for just Bioconductor 3.17 and 3.18:
./rspm-offline-downloader get bioconductor --rspm-version=2024.11.0 --destination=[/path/to/destination/] \
--versions=3.17,3.18
After running the download command, you can validate that the files were downloaded correctly using the rspm-offline-downloader validate bioconductor
command:
./rspm-offline-downloader validate bioconductor --rspm-version=2024.11.0 --destination=[/path/to/destination] --packages
Updating Bioconductor data
When updating downloaded Bioconductor data, you may choose to download only versions for which you need new data. When copying the new data to the air-gapped server, be sure to keep the data for previous versions. If you remove any data for versions that are in use, errors will occur when attempting to access packages or metadata from those versions.
PyPI#
Since the entirety of PyPI is too large to download in most cases, Package Manager mirrors a subset of PyPI when running in offline environments. You must specify a subset of PyPI packages to download using a requirements.txt
file.
This is similar to Curated PyPI Sources, and requirements.txt
files from curated PyPI sources may be reused for the offline downloader.
Otherwise, create a requirements.txt
file containing each PyPI package necessary for your installation, including all dependencies. The format of requirements.txt
is a text file with one package name per line:
For example, a requirements.txt
file for a mirror of the Django
and numpy
packages could look like:
See Generating requirements.txt for more details on how to generate a requirements.txt
file.
Note
Currently, the requirements.txt
file format for the offline downloader only supports package names with optional version constraints. Recursive file references and other definitions (e.g., extras, option flags, environment markers) will be ignored.
Once you have the requirements.txt
file, specify the path to it using the --file-in
flag:
./rspm-offline-downloader get pypi --rspm-version=2024.11.0 --destination=[/path/to/destination] \
--file-in=requirements.txt
After running the download command, you can validate that the files were downloaded correctly using the rspm-offline-downloader validate pypi
command.
./rspm-offline-downloader validate pypi --rspm-version=2024.11.0 --destination=[/path/to/destination] --packages
Limitations and recommendations
The mirrored PyPI subset will only include snapshots where the specified packages have changed. The PyPI snapshot calendar will only display these snapshots, so there may not be as many available dates as a full PyPI repository.
Changing the included subset of packages will alter the historical snapshots in the PyPI source, potentially affecting users installing packages from frozen snapshot URLs. Curated PyPI sources will not change, however, and will continue to include their originally added packages.
We recommend only adding packages to the offline subset in most cases, and only removing packages when it is certain that no users or curated sources are using that removed package. If removing a package causes package installation failures or prevents a curated PyPI source from being updated, this may be resolved by restoring the removed package in the offline data, or removing the package from the curated PyPI source.
Copying the Data#
After the offline data has been downloaded, copy it over to the offline Package Manager server.
First, create a directory to store the data in the offline Package Manager server, such as /var/lib/rspm-offline-data
. If you have a cluster of nodes, use shared storage for this directory.
Copy the data downloaded earlier from the online system to this directory on the offline Package Manager server. For completely isolated servers, you may need to copy the data to a physical drive in order to move it to the offline environment.
For example, if the downloaded data was located at /path/to/data
:
Confirm that the offline data directory has all the files from the original data directory.
Finally, modify the permissions on the directory in the offline Package Manager server, changing ownership to the Unix account running Package Manager, rstudio-pm
by default:
Configuring Package Manager#
Next, configure the offline Package Manager server to use the downloaded data. Set the Manifest.URL
configuration setting to the file path of the offline data directory.
[Manifest]
URL = A URL in the form, `file:///<the directory you created in the previous section>`
For example, if your offline data directory is at /var/lib/rspm-offline-data
, the file /etc/rstudio-pm/rstudio-pm.gcfg
should contain:
Once the file is updated, restart the Package Manager server:
If the configuration was successful, you should see messages like this in the server log at /var/log/rstudio/rstudio-pm/rstudio-pm.log
:
Configured to serve CRAN data from a directory. Checking path '/var/lib/rspm-offline-data'.
Configured to serve Bioconductor data from a directory. Checking path '/var/lib/rspm-offline-data'.
Configured to serve PyPI data from a directory. Checking path '/var/lib/rspm-offline-data'.
Follow the Quick Start guide to make CRAN, Bioconductor, or PyPI packages available in the offline Package Manager server. Package Manager will now update package data from the offline data directory (e.g., /var/lib/rspm-offline-data
) rather than the online Posit Package Service.
Regular Updates#
It is important to regularly update data available on the offline server. The Posit Package Service is typically updated with new packages each business day.
We recommend using the follow steps to keep your offline server up to date:
-
If you have maintained the originally downloaded files, you can perform a relatively fast update by re-running the
rspm-offline-downloader
commands. Subsequent command executions will simply add or update files as necessary without re-downloading the entire set. -
Copy the directory from the online machine to the directory created in the offline Package Manager during the initial setup, e.g.,
/var/lib/rspm-offline-data
. Ensure that the directory is still owned by the Unix account running Package Manager,rstudio-pm
by default. -
Once the offline data directory has been updated, the Package Manager server will automatically synchronize the new data during the scheduled syncs. You may also manually synchronize the data by running the
rspm sync
command.
Note
If you manually update the offline data using an external drive, you can use the --starting-snapshot
flag to only download new files since your last synchronization. Use the validate cran
, validate bioconductor
, or validate pypi
command in the rspm-offline-downloader
tool to ensure that the destination directory is valid.
Updating Vulnerability Data#
Package vulnerability data changes often, typically daily, so you may want to update the vulnerability data without updating any package data. This can be done using the rspm-offline-downloader get vulns
command.
Upgrading Package Manager#
A new version of Package Manager may require data from a new version of the Posit Package Service. To ensure a smooth upgrade with limited downtime, we recommend the following steps:
- You will need a staging environment that mirrors your offline production server. After creating this environment, begin by upgrading the offline staging server to the latest Package Manager release.
- Follow the instructions for the Initial Setup of an Air-Gapped server in the Initial Setup section, using the offline staging server. Always install the matching version of the offline downloader utility for your Package Manager server.
- After you have validated that everything works as expected, copy the offline data, e.g.,
/var/lib/rspm-offline-data
, from the offline staging server to the offline production server. - Upgrade the offline production server to the new version of Package Manager.
- (Optional) After an upgrade, clean up any unused files from the previous version of Package Manager. Navigate to the directory storing offline data, e.g.,
/var/lib/rspm-offline-data
. This directory will contain versioned directories, e.g.,
The output from rspm-offline-downloader get [ cran | bioconductor | pypi ]
will have indicated the version of the Posit Package Service required by the current version of Package Manager, e.g., Performing full download of schema version v4.
- In this example, only the following directories are necessary:
/var/lib/rspm-offline-data/v4
(CRAN)/var/lib/rspm-offline-data/manifest/v4
(CRAN)/var/lib/rspm-offline-data/bioc/manifest/v5
(Bioconductor)/var/lib/rspm-offline-data/pypi
(PyPI)/var/lib/rspm-offline-data/sysreqs
(System Requirements)/var/lib/rspm-offline-data/distros
(Supported platforms)/var/lib/rspm-offline-data/bindex
(CRAN Package Binary Index)/var/lib/rspm-offline-data/bin
(CRAN Package Binaries)/var/lib/rspm-offline-data/vulns
(Vulnerabilities)
- The following directories can be removed if present:
/var/lib/rspm-offline-data/v3
(old CRAN data)/var/lib/rspm-offline-data/v2
(old CRAN data)/var/lib/rspm-offline-data/bioc/v3
(old Bioconductor data)/var/lib/rspm-offline-data/bioc/v4
(old Bioconductor data)