High Availability and Load Balancing#
Enhanced Advanced
Multiple instances of Posit Package Manager can share the same data in a load balanced configuration to help achieve high availability (HA).
HA Checklist#
Follow the checklist below to configure multiple Package Manager instances for HA:
- Ensure that all node clocks are synchronized - see the HA Time Synchronization Requirements section.
- Ensure that all server configurations (i.e. contents of the
/etc/rstudio-pm
directory) are identical. - Install and configure the same version of Package Manager on each node.
- Migrate to a PostgreSQL database (if running SQLite); refer to the Changing the Database Provider section for steps. All nodes in the cluster must use the same PostgreSQL database.
- When using NFS for shared storage, configure each server's
Server.DataDir
via Storage Data to point to the same shared location. Be sure to read the HA Shared Directory section for additional information on the recommended settings for the shared directory. For more granular control of data directories, reference the Storage Classes appendix for information on customizing the locations of each storage class. - When using S3 for shared storage, each server's
Server.EncryptionKeyPath
must point to a file that contains the same encryption key.
HA Time Synchronization Requirements#
The clocks on all nodes in a HA configuration must be synchronized. We recommend configuring NTP for clock synchronization.
HA Limitations#
Node Management#
Package Manager nodes in a HA configuration are not self-aware of HA. The load-balancing responsibility is fully assumed by your load balancer, and the load balancer is responsible for directing requests to specific nodes and checking whether nodes are available to accept requests.
The CLI provides limited node management capabilities. Refer to Managing Cluster Nodes for more details.
Database Requirements#
Package Manager only supports HA when using a PostgreSQL database. If you are using SQLite, please switch to PostgreSQL. Reference the Changing the Database Provider section for more information.
Shared Data Directory Requirements#
Package Manager manages repository content within the server's data and variable data directories. These directories must be at shared locations, and each node must be configured to point to the same shared locations. Refer to the File Storage section for more information on the server's data directories.
Package Manager supports using either NFS or AWS S3 storage for shared data directories. You can also use a combination of both NFS and AWS S3 for different storage classes.
NFS#
We recommend and support NFS version 3 or 4 for file sharing.
Package Manager relies on being able to efficiently detect new files inside of the NFS-shared DataDir
. By default, NFS clients are configured to cache responses for up to 60 seconds, which means that it can take up to a minute before a Package Manager service is able to respond to certain requests. For most deployments, this is an unacceptably long delay.
Therefore, we strongly recommend that you modify your NFS client settings for the mount on which you'll be hosting your DataDir
. Typically, the best way to accomplish this is to set lookupcache=pos
for your NFS mount, which will allow existing files to be cached, but contact the NFS server directly to check for the existence of new files. If this setting is not acceptable for your mount, you could alternatively consider shortening acdirmax
or actimeo
so that your client becomes aware of new files within, for instance, 5 seconds, instead of the default of 60.
Note
If you are using Amazon Elastic File System (Amazon EFS), file attribute caching will be enabled by default using the Amazon EFS recommended mount options. We recommend adding the lookupcache=pos
option when mounting Amazon EFS file systems.
S3#
When using S3 for shared storage, each server's Server.EncryptionKeyPath
must point to a file that contains the same encryption key. Reference also the Server Configuration section in the appendix. The easiest way to ensure a consistent encryption key on all nodes is to start Package Manager on one of the nodes and then copy the key file created at /var/lib/rstudio-pm/rstudio-pm.key
to the same location on the other nodes. Set each key's file mode to 0600
.
Please refer to the File Storage section for information on configuring Package Manager to store variable data on S3. For help configuring your server with the credentials and settings you need to interact with S3, refer to the S3 Configuration section.
Managing Cluster Nodes#
The admin CLI provides limited node management capabilities. You can list nodes, take nodes offline, and bring offline nodes back online.
Listing Nodes#
To enumerate nodes in your cluster, run the following command.
Each line of the response includes:
- The node hostname. The hostname corresponds to the
Server.HostName
property. IfServer.HostName
is not set, then the server's hostname will be used. - The Package Manager version used by the node.
- The mode (offline/online) for the node.
Changing Offline/Online Mode#
To take nodes offline or bring them back online, use the rspm cluster offline
and rspm cluster online
commands. You must specify the nodes for the operation, and you can optionally specify a timeout in seconds.
Note
The admin CLI also supports the commands rspm offline
and rspm online
, which can be used to take a single Package Manager instance offline or to bring it back online. These commands only affect the instance the command is issued. Refer to the Online and Offline Modes section for more details.
# Take Node1 and Node2 offline.
$ rspm cluster offline --nodes=Node1,Node2
# Bring Node1 and Node2 back online.
$ rspm cluster online --nodes=Node1,Node2
# Bring Node3 online with a 5 minute timeout.
$ rspm cluster online --nodes=Node3 --timeout=300
When the rspm offline
and rspm online
commands complete, the nodes will be listed.
Upgrading a Cluster#
Upgrade Steps#
To reduce downtime during cluster upgrades, we allow specific nodes to be taken offline for upgrading. This provides the ability to always have multiple nodes running to maintain high availability.
Note
Take all nodes offline before bringing any upgraded nodes back online to avoid version mismatches. If you forget to take any non-upgraded nodes offline when bringing an upgraded node back online, the non-upgraded nodes will be using a binary that expects an earlier schema version and will be subject to unexpected and potentially serious errors. These nodes will detect an out-of-date database schema within 30 seconds and shut down automatically.
To upgrade a cluster with minimal downtime, follow these steps:
- Take one or more nodes offline using the
rspm cluster offline
command. See Managing Cluster Nodes for more details. - Upgrade the offline nodes. See Upgrading for upgrading each node.
- Take the remaining nodes offline.
- Bring the upgraded nodes back online.
- Upgrade the remaining nodes and bring them back online.
Example Upgrade#
Below is an example procedure for upgrading a 4-node cluster. We assume the nodes are named Node1
, Node2
, etc.
# Take nodes 1 and 2 offline
$ rspm cluster offline --nodes=Node1,Node2
# Upgrade nodes 1 and 2
$ ssh node1
$ sudo apt install ./rstudio-pm_2024.08.3-3487_amd64.deb
$ exit
$ ssh node2
$ sudo apt install ./rstudio-pm_2024.08.3-3487_amd64.deb
$ exit
# Take nodes 3 and 4 offline
$ rspm cluster offline --nodes=Node3,Node4
# Bring nodes 1 and 2 back online. Use a long timeout after
# upgrades to allow for database migrations.
$ rspm cluster online --nodes=Node1,Node2 --timeout=300
# Upgrade nodes 3 and 4
$ ssh node4
$ sudo apt install ./rstudio-pm_2024.08.3-3487_amd64.deb
$ exit
$ ssh node3
$ sudo apt install ./rstudio-pm_2024.08.3-3487_amd64.deb
$ exit
# Bring nodes 3 and 4 back online
$ rspm cluster online --nodes=Node3,Node4
Downgrading#
If you wish to move from an HA environment to a single-node environment, please follow these steps:
- Stop all Package Manager services on all nodes.
- Reconfigure your network to route traffic directly to one of the nodes, unless you wish to continue using a load balancer.
- If you wish to move all shared file data to the node, configure the server's
Server.DataDir
to point to a location on the node, and copy all the data from the NFS share to this location. Refer to the File Storage section for more information. - If you wish to move the databases to this node, install PostgreSQL on the node and copy the data. Moving the PostgreSQL databases from one server to another is beyond the scope of this guide. Please note that we do not support migrating from PostgreSQL to SQLite.
- Start the Package Manager process.