Package Ecosystem#
The R package ecosystem has a few key components.
Packages#
Packages are the primary extension mechanism for R and Python. They can be used to share functions, datasets, and documentation. A package can exist in a few states and the states are explained in the sections below.
Source#
A package is composed of a series of directories and files. The source of a package is just a top-level directory containing the components of the package. Package authors work with source packages during development. Git(Hub) repositories store source packages.
Bundle#
A bundled package is a package that's been compressed into a single file. By convention, package bundles in R use the extension .tar.gz
while package bundles in Python use the .whl
extension.
Binary#
A binary package is the result of building a source package for a specific operating system. Binary packages are single files that are ready for installation on their specific operating systems.
Installed#
An installed package is a binary package that has been decompressed into a package library and is ready for use by R.
Repositories#
Repositories organize packages for distribution to end users. Repositories contain package bundles and binaries that are organized in a specific way so that users can install packages from the repository using R's install.packages
command. CRAN and Bioconductor are examples of R repositories.
Git(Hub)#
Many R package sources are stored in version controlled directories. A popular versioning tool is Git. GitHub, as an extension of Git, houses many package sources. The devtools
R package includes convenience functions for installing packages from the package source contained on a Git repository, including GitHub. Used in this manner, git repositories and GitHub are one way to distribute R packages, but GitHub and Git repositories are not R package repositories.
Libraries#
End users of R typically interact with installed packages that live in libraries. Package libraries are just directories containing installed packages. When a package is requested by R, R searches the different library directories to find the installed package.
R libraries are very flexible. In the past, R users have set up libraries for specific projects or set up a system-wide library used across multiple projects. In multi-tenant servers it has been common to have both a system library shared by all users and user-specific libraries.
A best practice is to set up per-project libraries alongside a package cache.