scran
C++ library for basic single-cell RNA-seq analyses
|
Compute PCA after regressing out an uninteresting factor. More...
#include <ResidualPca.hpp>
Classes | |
struct | Defaults |
Default parameter settings. More... | |
struct | Results |
Container for the PCA results. More... | |
Public Member Functions | |
ResidualPca & | set_rank (int r=Defaults::rank) |
ResidualPca & | set_scale (bool s=Defaults::scale) |
ResidualPca & | set_transpose (bool t=Defaults::transpose) |
ResidualPca & | set_return_rotation (bool r=Defaults::return_rotation) |
ResidualPca & | set_return_center (bool r=Defaults::return_center) |
ResidualPca & | set_return_scale (bool r=Defaults::return_scale) |
ResidualPca & | set_block_weight_policy (WeightPolicy w=Defaults::block_weight_policy) |
ResidualPca & | set_variable_block_weight_parameters (VariableBlockWeightParameters v=Defaults::variable_block_weight_parameters) |
ResidualPca & | set_num_threads (int n=Defaults::num_threads) |
template<typename Data_ , typename Index_ , typename Block_ > | |
Results | run (const tatami::Matrix< Data_, Index_ > *mat, const Block_ *block) const |
template<typename Data_ , typename Index_ , typename Block_ , typename Subset_ > | |
Results | run (const tatami::Matrix< Data_, Index_ > *mat, const Block_ *block, const Subset_ *features) const |
Compute PCA after regressing out an uninteresting factor.
A simple batch correction method involves centering the expression of each gene in each batch to remove systematic differences between batches. The residuals are then used in PCA to obtain a batch-corrected low-dimensional representation of the dataset. Unfortunately, naively centering the expression values will discard sparsity and reduce the computational efficiency of the PCA. To avoid these drawbacks, ResidualPca
defers the residual calculation until the matrix multiplication of the IRLBA step. This yields the same results as the naive approach but is much faster as it can take advantage of efficient sparse operations.
We can optionally scale each batch so that they contribute equally to the rotation vectors, regardless of their size. This is achieved using the same approach described in MultiBatchPca
, whereby batches with more cells are downscaled during calculation of the rotation vectors. The final PCs are then obtained by projecting the residuals onto the space defined by the rotation vectors. This ensures that larger batches do not mask interesting variation in other batches.
Note that the use of residuals for batch correction makes some strong assumptions about the batches, e.g., they have the same composition and the batch effect is a consistent shift for all populations. Non-linear correction algorithms are usually more effective, e.g., MNN correction.
|
inline |
r | Number of PCs to compute. This should be no greater than the maximum number of PCs, i.e., the smaller dimension of the input matrix; otherwise, only the maximum number of PCs will be reported in the Results . |
ResidualPca
instance.
|
inline |
s | Should genes be scaled to unit variance? |
ResidualPca
instance.
|
inline |
t | Should the PC matrix be transposed on output? If true , the output PC matrix is column-major with cells in the columns, which is compatible with downstream libscran steps. |
ResidualPca
instance.
|
inline |
r | Should the rotation matrix be returned in the output? |
ResidualPca
instance.
|
inline |
r | Should the center vector be returned in the output? |
ResidualPca
instance.
|
inline |
r | Should the scale vector be returned in the output? |
ResidualPca
instance.
|
inline |
w | Policy to use for weighting batches of different size. |
ResidualPca
instance.
|
inline |
v | Parameters for the variable block weights, see variable_block_weight() for more details. Only used when the block weight policy is set to WeightPolicy::VARIABLE . |
ResidualPca
instance.
|
inline |
n | Number of threads to use. |
ResidualPca
instance.
|
inline |
Run the blocked PCA on an input gene-by-cell matrix.
Data_ | Floating point type for the data. |
Index_ | Integer type for the indices. |
Block_ | Integer type for the blocking factor. |
[in] | mat | Pointer to the input matrix. Columns should contain cells while rows should contain genes. |
[in] | block | Pointer to an array of length equal to the number of cells, containing the block assignment for each cell - see count_blocks() for details. |
Results
object containing the PCs and the variance explained.
|
inline |
Run the blocked PCA on an input gene-by-cell matrix after filtering for genes of interest. We typically use the set of highly variable genes from ChooseHVGs
, with the aim being to improve computational efficiency and avoid random noise by removing lowly variable genes.
Data_ | Floating point type for the data. |
Index_ | Integer type for the indices. |
Block_ | Integer type for the blocking factor. |
Subset_ | Integer type for the feature filter. |
[in] | mat | Pointer to the input matrix. Columns should contain cells while rows should contain genes. |
[in] | block | Pointer to an array of length equal to the number of cells, containing the block assignment for each cell - see count_blocks() for details. |
[in] | features | Pointer to an array of length equal to the number of genes. Each entry is a boolean specifying whether the corresponding gene should be used in the PCA. |
Results
object containing the PCs and the variance explained.