Load Balanced
Enhanced Advanced
Posit Package Manager can be configured to run on Azure in a Load Balanced (LB) cluster configuration for a non-air-gapped environment. In this architecture, Posit Package Manager can handle a large number of users and have the reliability that comes with running behind a load balancer.
This configuration is suitable for teams of hundreds of data scientists who want or require high availability within their organization.
Most companies don’t need to run in this configuration unless they have many concurrent package uploads/downloads or are required to have high availability for compliance reasons. Instead, the single server architecture of Package Manager would be more suitable for small teams that don’t require HA.
Architecture Overview#
This Posit Package Manager implementation leverages deploying in a Load Balanced configuration. It additionally leverages:
- Two Azure Virtual Machine (VM) instances running in HA.
- Azure Files for Posit Package Manager's shared file storage.
- Azure Database for PostgreSQL for PostgreSQL, serving as the application database for Posit Package Manager.
- Azure Application Gateway to route requests to the Posit Package Manager service.
Architecture Diagram#
Nodes#
We recommend running Posit Package Manager on two VMs across different availability zones. We have tested with Standard D8 v5
instances (8 vCPUs, 32 GiB memory) and can serve 30 million package installs per month, or one million package installs per day. This configuration can also handle 100 Git builders concurrently building packages from Git repositories.
Note
Each Posit Package Manager user could be downloading dozens or hundreds of packages a day. There are also other usage patterns such as an admin uploading local packages or the server building packages for Git builders, but package installations give a good idea of what load and throughput this configuration can handle.
The VMs in a load balanced configuration require the following configuration:
- Matching versions of Posit Package Manager
- Shared encryption keys for every node
- Shared configuration file for every node
- All the necessary versions of R and Python (if using Git building functionality)
The Package Manager Admin Guide offers an HA Checklist to follow when setting up Package Manager behind a load balancer.
Database#
This configuration uses Azure Database for PostgreSQL - Flexible Server on a Standard D4ds v4
instance (4 vCPUs, 16 GiB memory) with 128 GiB of storage and zone-redundant high availability.
Zone-redundant high availability allows for the Azure Database instance to run in an active/passive configuration across 2 availability zones, with auto-failover when the primary instance goes down.
The Azure Database instance should be configured with an empty Postgres database for the Posit Package Manager metadata.
This is a very generous configuration. In our testing, the Postgres database handled one million package installs per day without exceeding 10-20% CPU utilization.
Storage#
An Azure Files NFS file share is used to store data about packages and sources, as well as cached metadata to decrease response times for requests.
We have provisioned a 1000 GiB NFS file share using Azure Premium SSDs with zone-redundant storage (ZRS). ZRS allows for data to be copied across 3 availability zones within a single region.
The NFS file share should also be configured with the recommended mount options for NFS Azure file shares: nconnect=4
, noresvport
, actimeo=30
, and lookupcache=pos
.
Networking#
This configuration uses Azure Application Gateway for load balancing requests to the Posit Package Manager cluster.
The Application Gateway is configured with the Standard V2 tier, with autoscaling and zone redundancy across 3 availabilility zones.
Configuration Details#
No additional configuration is required for this architecture beyond the initial setup steps outlined in the load balanced installation steps.
FAQ#
See the Architecture Frequently Asked Questions page for more information for the general FAQ.