scran
C++ library for basic single-cell RNA-seq analyses
Loading...
Searching...
No Matches
Classes | Public Member Functions | List of all members
scran::ResidualPca Class Reference

Compute PCA after regressing out an uninteresting factor. More...

#include <ResidualPca.hpp>

Classes

struct  Defaults
 Default parameter settings. More...
 
struct  Results
 Container for the PCA results. More...
 

Public Member Functions

ResidualPcaset_rank (int r=Defaults::rank)
 
ResidualPcaset_scale (bool s=Defaults::scale)
 
ResidualPcaset_transpose (bool t=Defaults::transpose)
 
ResidualPcaset_return_rotation (bool r=Defaults::return_rotation)
 
ResidualPcaset_return_center (bool r=Defaults::return_center)
 
ResidualPcaset_return_scale (bool r=Defaults::return_scale)
 
ResidualPcaset_block_weight_policy (WeightPolicy w=Defaults::block_weight_policy)
 
ResidualPcaset_variable_block_weight_parameters (VariableBlockWeightParameters v=Defaults::variable_block_weight_parameters)
 
ResidualPcaset_num_threads (int n=Defaults::num_threads)
 
template<typename Data_ , typename Index_ , typename Block_ >
Results run (const tatami::Matrix< Data_, Index_ > *mat, const Block_ *block) const
 
template<typename Data_ , typename Index_ , typename Block_ , typename Subset_ >
Results run (const tatami::Matrix< Data_, Index_ > *mat, const Block_ *block, const Subset_ *features) const
 

Detailed Description

Compute PCA after regressing out an uninteresting factor.

A simple batch correction method involves centering the expression of each gene in each batch to remove systematic differences between batches. The residuals are then used in PCA to obtain a batch-corrected low-dimensional representation of the dataset. Unfortunately, naively centering the expression values will discard sparsity and reduce the computational efficiency of the PCA. To avoid these drawbacks, ResidualPca defers the residual calculation until the matrix multiplication of the IRLBA step. This yields the same results as the naive approach but is much faster as it can take advantage of efficient sparse operations.

We can optionally scale each batch so that they contribute equally to the rotation vectors, regardless of their size. This is achieved using the same approach described in MultiBatchPca, whereby batches with more cells are downscaled during calculation of the rotation vectors. The final PCs are then obtained by projecting the residuals onto the space defined by the rotation vectors. This ensures that larger batches do not mask interesting variation in other batches.

Note that the use of residuals for batch correction makes some strong assumptions about the batches, e.g., they have the same composition and the batch effect is a consistent shift for all populations. Non-linear correction algorithms are usually more effective, e.g., MNN correction.

Member Function Documentation

◆ set_rank()

ResidualPca & scran::ResidualPca::set_rank ( int  r = Defaults::rank)
inline
Parameters
rNumber of PCs to compute. This should be no greater than the maximum number of PCs, i.e., the smaller dimension of the input matrix; otherwise, only the maximum number of PCs will be reported in the Results.
Returns
A reference to this ResidualPca instance.

◆ set_scale()

ResidualPca & scran::ResidualPca::set_scale ( bool  s = Defaults::scale)
inline
Parameters
sShould genes be scaled to unit variance?
Returns
A reference to this ResidualPca instance.

◆ set_transpose()

ResidualPca & scran::ResidualPca::set_transpose ( bool  t = Defaults::transpose)
inline
Parameters
tShould the PC matrix be transposed on output? If true, the output PC matrix is column-major with cells in the columns, which is compatible with downstream libscran steps.
Returns
A reference to this ResidualPca instance.

◆ set_return_rotation()

ResidualPca & scran::ResidualPca::set_return_rotation ( bool  r = Defaults::return_rotation)
inline
Parameters
rShould the rotation matrix be returned in the output?
Returns
A reference to this ResidualPca instance.

◆ set_return_center()

ResidualPca & scran::ResidualPca::set_return_center ( bool  r = Defaults::return_center)
inline
Parameters
rShould the center vector be returned in the output?
Returns
A reference to this ResidualPca instance.

◆ set_return_scale()

ResidualPca & scran::ResidualPca::set_return_scale ( bool  r = Defaults::return_scale)
inline
Parameters
rShould the scale vector be returned in the output?
Returns
A reference to this ResidualPca instance.

◆ set_block_weight_policy()

ResidualPca & scran::ResidualPca::set_block_weight_policy ( WeightPolicy  w = Defaults::block_weight_policy)
inline
Parameters
wPolicy to use for weighting batches of different size.
Returns
A reference to this ResidualPca instance.

◆ set_variable_block_weight_parameters()

ResidualPca & scran::ResidualPca::set_variable_block_weight_parameters ( VariableBlockWeightParameters  v = Defaults::variable_block_weight_parameters)
inline
Parameters
vParameters for the variable block weights, see variable_block_weight() for more details. Only used when the block weight policy is set to WeightPolicy::VARIABLE.
Returns
A reference to this ResidualPca instance.

◆ set_num_threads()

ResidualPca & scran::ResidualPca::set_num_threads ( int  n = Defaults::num_threads)
inline
Parameters
nNumber of threads to use.
Returns
A reference to this ResidualPca instance.

◆ run() [1/2]

template<typename Data_ , typename Index_ , typename Block_ >
Results scran::ResidualPca::run ( const tatami::Matrix< Data_, Index_ > *  mat,
const Block_ *  block 
) const
inline

Run the blocked PCA on an input gene-by-cell matrix.

Template Parameters
Data_Floating point type for the data.
Index_Integer type for the indices.
Block_Integer type for the blocking factor.
Parameters
[in]matPointer to the input matrix. Columns should contain cells while rows should contain genes.
[in]blockPointer to an array of length equal to the number of cells, containing the block assignment for each cell - see count_blocks() for details.
Returns
A Results object containing the PCs and the variance explained.

◆ run() [2/2]

template<typename Data_ , typename Index_ , typename Block_ , typename Subset_ >
Results scran::ResidualPca::run ( const tatami::Matrix< Data_, Index_ > *  mat,
const Block_ *  block,
const Subset_ *  features 
) const
inline

Run the blocked PCA on an input gene-by-cell matrix after filtering for genes of interest. We typically use the set of highly variable genes from ChooseHVGs, with the aim being to improve computational efficiency and avoid random noise by removing lowly variable genes.

Template Parameters
Data_Floating point type for the data.
Index_Integer type for the indices.
Block_Integer type for the blocking factor.
Subset_Integer type for the feature filter.
Parameters
[in]matPointer to the input matrix. Columns should contain cells while rows should contain genes.
[in]blockPointer to an array of length equal to the number of cells, containing the block assignment for each cell - see count_blocks() for details.
[in]featuresPointer to an array of length equal to the number of genes. Each entry is a boolean specifying whether the corresponding gene should be used in the PCA.
Returns
A Results object containing the PCs and the variance explained.

The documentation for this class was generated from the following file: