Load Balanced
Enhanced Advanced
Posit Package Manager can be configured to run on AWS in a Load Balanced (LB) cluster configuration for a non-air-gapped environment. In this architecture, Posit Package Manager can handle a large number of users and have the reliability that comes with running behind a load balancer.
This configuration is suitable for teams of hundreds of data scientists who want or require high availability within their organization.
Most companies don’t need to run in this configuration unless they have many concurrent package uploads/downloads or are required to have high availability for compliance reasons. Instead, the single server architecture of Package Manager would be more suitable for small teams that don’t require HA.
Architecture Overview#
This Posit Package Manager implementation leverages deploying in a Load Balanced configuration. It additionally leverages:
- An AWS Application Load Balancer (ALB) for ingress.
- Two EC2 instances running in HA.
- An S3 bucket for Posit Package Manager’s object storage.
- An RDS instance that includes a Postgres database for Posit Package Manager metadata.
Architecture Diagram#
Nodes#
Posit Package Manager can be run across two EC2 instances in a clustered configuration. We have tested with c5.4xlarge
instances (16 vCPUs, 32 GiB Memory) and can serve 30 million package installs per month, or 1 million package installs per day. This configuration can also handle 100 Git builders concurrently building packages from Git repositories.
Note
Each Posit Package Manager user could be downloading dozens or hundreds of packages a day. There are also other usage patterns such as an admin uploading local packages or the server building packages for Git builders, but package installations give a good idea of what load and throughput this configuration can handle. This configuration has been run on Posit Public Package Manager, so we don’t anticipate any individual customer needing to scale beyond this configuration.
The EC2 instances in an load balanced configuration require the following configuration:
- Matching versions of Posit Package Manager
- Shared encryption keys for every node
- Shared configuration file for every node
- All the necessary versions of R and Python (if using Git building functionality)
The Package Manager Admin Guide offers an HA Checklist to follow when setting up Package Manager behind a load balancer.
Database#
This configuration uses RDS with Postgres on a db.t3.xlarge
instance with 100 GB of storage. This is a very generous configuration. In our testing, the Postgres database handled 1,000,000+ package installs per day without exceeding 10-20% CPU utilization.
The RDS instance should be configured with an empty Postgres database for the Posit Package Manager metadata.
Storage#
The S3 bucket is used to store data about packages and sources, as well as cached metadata to decrease response times for requests. S3 can also be used with KMS for client-side encryption.
Networking#
This reference architecture deploys Posit Package Manager inside a single private subnet and availability zone with ingress using an Application Load Balancer. Multiple availability zones can be used if required.
Configuration Details#
No additional configuration is required for this architecture beyond the initial setup steps outlined in the load balanced installation steps.
Resiliency and Availability#
This configuration of Posit Package Manager has been deployed on the Posit Public Package Manager service. As a publicly available service, the architecture is tested by the R and Python communities that use it. Public Package Manager is used by many more users than any private Posit Package Manager instance. The current uptime for the Posit Public Package Manager service can be found on the status page.
FAQ#
See the Architecture Frequently Asked Questions page for more information for the general FAQ.