As the name suggests, this repository implements functions to perform a PCA on the gene-by-cell expression matrix, returning low-dimensional coordinates for each cell that can be used for efficient downstream analyses, e.g., clustering, visualization. The code itself was originally derived from the scran and batchelor R packages, factored out into a separate C++ library for easier re-use.
Given a tatami::Matrix, the scran_pca::simple_pca() function will compute the PCA to obtain a low-dimensional representation of the cells:
#include "scran_pca/scran_pca.hpp"
const tatami::Matrix<double, int>& mat = some_data_source();
// Take the top 20 PCs:
scran_pca::SimplePcaOptions opt;
opt.number = 20;
auto res = scran_pca::simple_pca(mat, opt);
res.components; // rows are PCs, columns are cells.
res.rotation; // rows are genes, columns correspond to PCs.
res.variance_explained; // one per PC, in decreasing order.
res.total_variance; // total variance in the dataset.Advanced users can fiddle with more of the options:
opt.scale = true;
opt.num_threads = 4;
opt.realize_matrix = false;
auto res2 = scran_pca::simple_pca(mat, opt);In the presence of multiple blocks, we can perform the PCA on the residuals after regressing out the blocking factor. This ensures that the inter-block differences do not contribute to the first few PCs, instead favoring the representation of intra-block variation.
std::vector<int> blocks = some_blocks();
scran_pca::BlockedPcaOptions bopt;
bopt.number = 10; // taking the top 10 PCs this time.
auto bres = scran_pca::blocked_pca(mat, blocks.data(), bopt);
bres.components; // rows are PCs, columns are cells.
bres.center; // rows are blocks, columns are genes.The components derived from the residuals will only be free of inter-block differences under certain conditions (equal population composition with a consistent shift between blocks).
If this is not the case, more sophisticated batch correction methods are required such as MNN correction.
If those methods accept a low-dimensional representation for the cells as input,
we can use scran_pca::blocked_pca() to obtain an appropriate matrix that focuses on intra-block variation without making assumptions about the inter-block differences:
bopt.components_from_residuals = false;
auto bres2 = scran_pca::blocked_pca(mat, blocks.data(), bopt);Check out the reference documentation for more details.
If you're using CMake, you just need to add something like this to your CMakeLists.txt:
include(FetchContent)
FetchContent_Declare(
scran_pca
GIT_REPOSITORY https://github.com/libscran/scran_pca
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(scran_pca)Then you can link to scran_pca to make the headers available during compilation:
# For executables:
target_link_libraries(myexe libscran::scran_pca)
# For libaries
target_link_libraries(mylib INTERFACE libscran::scran_pca)find_package(libscran_scran_pca CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE libscran::scran_pca)To install the library, use:
mkdir build && cd build
cmake .. -DSCRAN_PCA_TESTS=OFF
cmake --build . --target installBy default, this will use FetchContent to fetch all external dependencies.
If you want to install them manually, use -DSCRAN_PCA_FETCH_EXTERN=OFF.
See the tags in extern/CMakeLists.txt to find compatible versions of each dependency.
If you're not using CMake, the simple approach is to just copy the files in include/ - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I.
This also requires the external dependencies listed in extern/CMakeLists.txt.