scran
C++ library for basic single-cell RNA-seq analyses
Loading...
Searching...
No Matches
Classes | Public Member Functions | List of all members
scran::MultiBatchPca Class Reference

Compute PCA after adjusting for differences between batch sizes. More...

#include <MultiBatchPca.hpp>

Classes

struct  Defaults
 Default parameter settings. More...
 
struct  Results
 Container for the PCA results. More...
 

Public Member Functions

MultiBatchPcaset_rank (int r=Defaults::rank)
 
MultiBatchPcaset_scale (bool s=Defaults::scale)
 
MultiBatchPcaset_transpose (bool t=Defaults::transpose)
 
MultiBatchPcaset_use_residuals (bool u=Defaults::use_residuals)
 
MultiBatchPcaset_block_weight_policy (WeightPolicy w=Defaults::block_weight_policy)
 
MultiBatchPcaset_variable_block_weight_parameters (VariableBlockWeightParameters v=Defaults::variable_block_weight_parameters)
 
MultiBatchPcaset_return_rotation (bool r=Defaults::return_rotation)
 
MultiBatchPcaset_return_center (bool r=Defaults::return_center)
 
MultiBatchPcaset_return_scale (bool r=Defaults::return_scale)
 
MultiBatchPcaset_num_threads (int n=Defaults::num_threads)
 
template<typename T , typename IDX , typename Batch >
Results run (const tatami::Matrix< T, IDX > *mat, const Batch *batch) const
 
template<typename T , typename IDX , typename Batch , typename X >
Results run (const tatami::Matrix< T, IDX > *mat, const Batch *batch, const X *features) const
 

Detailed Description

Compute PCA after adjusting for differences between batch sizes.

In multi-batch scenarios, we may wish to compute a PCA involving data from multiple batches. However, if one batch has many more cells, it will dominate the PCA by driving the definition of the rotation vectors. This may mask interesting aspects of variation in the smaller batches. To overcome this problem, we scale each batch in inverse proportion to its size. This ensures that each batch contributes equally to the (conceptual) gene-gene covariance matrix, the eigenvectors of which are used as the rotation vectors. Cells are then projected to the subspace defined by these rotation vectors to obtain PC coordinates.

Alternatively, we can compute rotation vectors from the residuals, i.e., after centering each batch. The gene-gene covariance matrix will thus focus on variation within each batch, ensuring that the top PCs capture biological heterogeneity instead of batch effects. (This is particularly important in applications with many batches, where batch effects might otherwise displace biology from the top PCs.) However, unlike ResidualPca, it is important to note that the residuals are only used here for calculating the rotation vectors. We still project the input matrix to obtain the PCs, so batch effects will likely still be present (though hopefully less pronounced) and must be removed with methods like MNN correction.

Finally, we can combine these mechanisms to compute rotation vectors from residuals with equal weighting. This gives us the benefits of both approaches as described above.

Member Function Documentation

◆ set_rank()

MultiBatchPca & scran::MultiBatchPca::set_rank ( int  r = Defaults::rank)
inline
Parameters
rNumber of PCs to compute. This should be no greater than the maximum number of PCs, i.e., the smaller dimension of the input matrix; otherwise, only the maximum number of PCs will be reported in the Results.
Returns
A reference to this MultiBatchPca instance.

◆ set_scale()

MultiBatchPca & scran::MultiBatchPca::set_scale ( bool  s = Defaults::scale)
inline
Parameters
sShould genes be scaled to unit variance?
Returns
A reference to this MultiBatchPca instance.

◆ set_transpose()

MultiBatchPca & scran::MultiBatchPca::set_transpose ( bool  t = Defaults::transpose)
inline
Parameters
tShould the PC matrix be transposed on output? If true, the output matrix is column-major with cells in the columns, which is compatible with downstream libscran steps.
Returns
A reference to this MultiBatchPca instance.

◆ set_use_residuals()

MultiBatchPca & scran::MultiBatchPca::set_use_residuals ( bool  u = Defaults::use_residuals)
inline
Parameters
uWhether to compute the rotation vectors from the residuals after centering each batch.
Returns
A reference to this MultiBatchPca instance.

◆ set_block_weight_policy()

MultiBatchPca & scran::MultiBatchPca::set_block_weight_policy ( WeightPolicy  w = Defaults::block_weight_policy)
inline
Parameters
wPolicy to use for weighting batches of different size.
Returns
A reference to this MultiBatchPca instance.

◆ set_variable_block_weight_parameters()

MultiBatchPca & scran::MultiBatchPca::set_variable_block_weight_parameters ( VariableBlockWeightParameters  v = Defaults::variable_block_weight_parameters)
inline
Parameters
vParameters for the variable block weights, see variable_block_weight() for more details. Only used when the block weight policy is set to WeightPolicy::VARIABLE.
Returns
A reference to this MultiBatchPca instance.

◆ set_return_rotation()

MultiBatchPca & scran::MultiBatchPca::set_return_rotation ( bool  r = Defaults::return_rotation)
inline
Parameters
rShould the rotation matrix be returned in the output?
Returns
A reference to this MultiBatchPca instance.

◆ set_return_center()

MultiBatchPca & scran::MultiBatchPca::set_return_center ( bool  r = Defaults::return_center)
inline
Parameters
rShould the center vector be returned in the output?
Returns
A reference to this MultiBatchPca instance.

◆ set_return_scale()

MultiBatchPca & scran::MultiBatchPca::set_return_scale ( bool  r = Defaults::return_scale)
inline
Parameters
rShould the scale vector be returned in the output?
Returns
A reference to this MultiBatchPca instance.

◆ set_num_threads()

MultiBatchPca & scran::MultiBatchPca::set_num_threads ( int  n = Defaults::num_threads)
inline
Parameters
nNumber of threads to use.
Returns
A reference to this MultiBatchPca instance.

◆ run() [1/2]

template<typename T , typename IDX , typename Batch >
Results scran::MultiBatchPca::run ( const tatami::Matrix< T, IDX > *  mat,
const Batch *  batch 
) const
inline

Run the multi-batch PCA on an input gene-by-cell matrix.

Template Parameters
TFloating point type for the data.
IDXInteger type for the indices.
BatchInteger type for the batch assignments.
Parameters
[in]matPointer to the input matrix. Columns should contain cells while rows should contain genes.
[in]batchPointer to an array of length equal to the number of cells. This should contain a 0-based batch assignment for each cell (i.e., for n batches, batch identities should run from 0 to n-1 with at least one entry for each batch.)
Returns
A Results object containing the PCs and the variance explained.

◆ run() [2/2]

template<typename T , typename IDX , typename Batch , typename X >
Results scran::MultiBatchPca::run ( const tatami::Matrix< T, IDX > *  mat,
const Batch *  batch,
const X *  features 
) const
inline

Run the multi-batch PCA on an input gene-by-cell matrix after filtering for genes of interest. We typically use the set of highly variable genes from ChooseHVGs, with the aim being to improve computational efficiency and avoid random noise by removing lowly variable genes.

Template Parameters
TFloating point type for the data.
IDXInteger type for the indices.
BatchInteger type for the batch assignments
XInteger type for the feature filter.
Parameters
[in]matPointer to the input matrix. Columns should contain cells while rows should contain genes.
[in]batchPointer to an array of length equal to the number of cells. This should contain a 0-based batch assignment for each cell (i.e., for n batches, batch identities should run from 0 to n-1 with at least one entry for each batch.)
[in]featuresPointer to an array of length equal to the number of genes. Each entry treated as a boolean specifying whether the corresponding genes should be used in the PCA.
Returns
A Results object containing the PCs and the variance explained.

The documentation for this class was generated from the following file: