Compute PCA after regressing out an uninteresting factor. More...

#include <ResidualPca.hpp>

Classes
struct	Defaults
	Default parameter settings. More...

struct	Results
	Container for the PCA results. More...

Public Member Functions
ResidualPca &	set_rank (int r=Defaults::rank)

ResidualPca &	set_scale (bool s=Defaults::scale)

ResidualPca &	set_transpose (bool t=Defaults::transpose)

ResidualPca &	set_return_rotation (bool r=Defaults::return_rotation)

ResidualPca &	set_return_center (bool r=Defaults::return_center)

ResidualPca &	set_return_scale (bool r=Defaults::return_scale)

ResidualPca &	set_block_weight_policy (WeightPolicy w=Defaults::block_weight_policy)

ResidualPca &	set_variable_block_weight_parameters (VariableBlockWeightParameters v=Defaults::variable_block_weight_parameters)

ResidualPca &	set_num_threads (int n=Defaults::num_threads)

template<typename Data_ , typename Index_ , typename Block_ >
Results	run (const tatami::Matrix< Data_, Index_ > mat, const Block_ block) const

template<typename Data_ , typename Index_ , typename Block_ , typename Subset_ >
Results	run (const tatami::Matrix< Data_, Index_ > mat, const Block_ block, const Subset_ *features) const

Detailed Description

Compute PCA after regressing out an uninteresting factor.

A simple batch correction method involves centering the expression of each gene in each batch to remove systematic differences between batches. The residuals are then used in PCA to obtain a batch-corrected low-dimensional representation of the dataset. Unfortunately, naively centering the expression values will discard sparsity and reduce the computational efficiency of the PCA. To avoid these drawbacks, ResidualPca defers the residual calculation until the matrix multiplication of the IRLBA step. This yields the same results as the naive approach but is much faster as it can take advantage of efficient sparse operations.

We can optionally scale each batch so that they contribute equally to the rotation vectors, regardless of their size. This is achieved using the same approach described in MultiBatchPca, whereby batches with more cells are downscaled during calculation of the rotation vectors. The final PCs are then obtained by projecting the residuals onto the space defined by the rotation vectors. This ensures that larger batches do not mask interesting variation in other batches.

Note that the use of residuals for batch correction makes some strong assumptions about the batches, e.g., they have the same composition and the batch effect is a consistent shift for all populations. Non-linear correction algorithms are usually more effective, e.g., MNN correction.

Member Function Documentation

◆ set_rank()

ResidualPca & scran::ResidualPca::set_rank ( int r = Defaults::rank )

inline

Parameters

r	Number of PCs to compute. This should be no greater than the maximum number of PCs, i.e., the smaller dimension of the input matrix; otherwise, only the maximum number of PCs will be reported in the `Results`.

Returns: A reference to this ResidualPca instance.

◆ set_scale()

ResidualPca & scran::ResidualPca::set_scale ( bool s = Defaults::scale )

inline

Parameters

s	Should genes be scaled to unit variance?

Returns: A reference to this ResidualPca instance.

◆ set_transpose()

ResidualPca & scran::ResidualPca::set_transpose ( bool t = Defaults::transpose )

inline

Parameters

t	Should the PC matrix be transposed on output? If `true`, the output PC matrix is column-major with cells in the columns, which is compatible with downstream libscran steps.

Returns: A reference to this ResidualPca instance.

◆ set_return_rotation()

ResidualPca & scran::ResidualPca::set_return_rotation ( bool r = Defaults::return_rotation )

inline

Parameters

r	Should the rotation matrix be returned in the output?

Returns: A reference to this ResidualPca instance.

◆ set_return_center()

ResidualPca & scran::ResidualPca::set_return_center ( bool r = Defaults::return_center )

inline

Parameters

r	Should the center vector be returned in the output?

Returns: A reference to this ResidualPca instance.

◆ set_return_scale()

ResidualPca & scran::ResidualPca::set_return_scale ( bool r = Defaults::return_scale )

inline

Parameters

r	Should the scale vector be returned in the output?

Returns: A reference to this ResidualPca instance.

◆ set_block_weight_policy()

ResidualPca & scran::ResidualPca::set_block_weight_policy ( WeightPolicy w = Defaults::block_weight_policy )

inline

Parameters

w	Policy to use for weighting batches of different size.

Returns: A reference to this ResidualPca instance.

◆ set_variable_block_weight_parameters()

ResidualPca & scran::ResidualPca::set_variable_block_weight_parameters ( VariableBlockWeightParameters v = Defaults::variable_block_weight_parameters )

inline

Parameters

v	Parameters for the variable block weights, see `variable_block_weight()` for more details. Only used when the block weight policy is set to `WeightPolicy::VARIABLE`.

Returns: A reference to this ResidualPca instance.

◆ set_num_threads()

ResidualPca & scran::ResidualPca::set_num_threads ( int n = Defaults::num_threads )

inline

Parameters

n	Number of threads to use.

Returns: A reference to this ResidualPca instance.

◆ run() [1/2]

template<typename Data_ , typename Index_ , typename Block_ >

Results scran::ResidualPca::run	(	const tatami::Matrix< Data_, Index_ > *	mat,
		const Block_ *	block
	)		const

inline

Run the blocked PCA on an input gene-by-cell matrix.

Template Parameters

Data_	Floating point type for the data.
Index_	Integer type for the indices.
Block_	Integer type for the blocking factor.

Parameters

[in]	mat	Pointer to the input matrix. Columns should contain cells while rows should contain genes.
[in]	block	Pointer to an array of length equal to the number of cells, containing the block assignment for each cell - see `count_blocks()` for details.

Returns: A Results object containing the PCs and the variance explained.

◆ run() [2/2]

template<typename Data_ , typename Index_ , typename Block_ , typename Subset_ >

Results scran::ResidualPca::run	(	const tatami::Matrix< Data_, Index_ > *	mat,
		const Block_ *	block,
		const Subset_ *	features
	)		const

inline

Run the blocked PCA on an input gene-by-cell matrix after filtering for genes of interest. We typically use the set of highly variable genes from ChooseHVGs, with the aim being to improve computational efficiency and avoid random noise by removing lowly variable genes.

Template Parameters

Data_	Floating point type for the data.
Index_	Integer type for the indices.
Block_	Integer type for the blocking factor.
Subset_	Integer type for the feature filter.

Parameters

[in]	mat	Pointer to the input matrix. Columns should contain cells while rows should contain genes.
[in]	block	Pointer to an array of length equal to the number of cells, containing the block assignment for each cell - see `count_blocks()` for details.
[in]	features	Pointer to an array of length equal to the number of genes. Each entry is a boolean specifying whether the corresponding gene should be used in the PCA.

Returns: A Results object containing the PCs and the variance explained.

The documentation for this class was generated from the following file:

scran/dimensionality_reduction/ResidualPca.hpp

Classes

Public Member Functions

Detailed Description

Member Function Documentation

◆ set_rank()

◆ set_scale()

◆ set_transpose()

◆ set_return_rotation()

◆ set_return_center()

◆ set_return_scale()

◆ set_block_weight_policy()

◆ set_variable_block_weight_parameters()

◆ set_num_threads()

◆ run() [1/2]

◆ run() [2/2]