Compute PCA after adjusting for differences between batch sizes. More...

#include <MultiBatchPca.hpp>

Classes
struct	Defaults
	Default parameter settings. More...

struct	Results
	Container for the PCA results. More...

Public Member Functions
MultiBatchPca &	set_rank (int r=Defaults::rank)

MultiBatchPca &	set_scale (bool s=Defaults::scale)

MultiBatchPca &	set_transpose (bool t=Defaults::transpose)

MultiBatchPca &	set_use_residuals (bool u=Defaults::use_residuals)

MultiBatchPca &	set_block_weight_policy (WeightPolicy w=Defaults::block_weight_policy)

MultiBatchPca &	set_variable_block_weight_parameters (VariableBlockWeightParameters v=Defaults::variable_block_weight_parameters)

MultiBatchPca &	set_return_rotation (bool r=Defaults::return_rotation)

MultiBatchPca &	set_return_center (bool r=Defaults::return_center)

MultiBatchPca &	set_return_scale (bool r=Defaults::return_scale)

MultiBatchPca &	set_num_threads (int n=Defaults::num_threads)

template<typename T , typename IDX , typename Batch >
Results	run (const tatami::Matrix< T, IDX > mat, const Batch batch) const

template<typename T , typename IDX , typename Batch , typename X >
Results	run (const tatami::Matrix< T, IDX > mat, const Batch batch, const X *features) const

Detailed Description

Compute PCA after adjusting for differences between batch sizes.

In multi-batch scenarios, we may wish to compute a PCA involving data from multiple batches. However, if one batch has many more cells, it will dominate the PCA by driving the definition of the rotation vectors. This may mask interesting aspects of variation in the smaller batches. To overcome this problem, we scale each batch in inverse proportion to its size. This ensures that each batch contributes equally to the (conceptual) gene-gene covariance matrix, the eigenvectors of which are used as the rotation vectors. Cells are then projected to the subspace defined by these rotation vectors to obtain PC coordinates.

Alternatively, we can compute rotation vectors from the residuals, i.e., after centering each batch. The gene-gene covariance matrix will thus focus on variation within each batch, ensuring that the top PCs capture biological heterogeneity instead of batch effects. (This is particularly important in applications with many batches, where batch effects might otherwise displace biology from the top PCs.) However, unlike ResidualPca, it is important to note that the residuals are only used here for calculating the rotation vectors. We still project the input matrix to obtain the PCs, so batch effects will likely still be present (though hopefully less pronounced) and must be removed with methods like MNN correction.

Finally, we can combine these mechanisms to compute rotation vectors from residuals with equal weighting. This gives us the benefits of both approaches as described above.

Member Function Documentation

◆ set_rank()

MultiBatchPca & scran::MultiBatchPca::set_rank ( int r = Defaults::rank )

inline

Parameters

r	Number of PCs to compute. This should be no greater than the maximum number of PCs, i.e., the smaller dimension of the input matrix; otherwise, only the maximum number of PCs will be reported in the `Results`.

Returns: A reference to this MultiBatchPca instance.

◆ set_scale()

MultiBatchPca & scran::MultiBatchPca::set_scale ( bool s = Defaults::scale )

inline

Parameters

s	Should genes be scaled to unit variance?

Returns: A reference to this MultiBatchPca instance.

◆ set_transpose()

MultiBatchPca & scran::MultiBatchPca::set_transpose ( bool t = Defaults::transpose )

inline

Parameters

t	Should the PC matrix be transposed on output? If `true`, the output matrix is column-major with cells in the columns, which is compatible with downstream libscran steps.

Returns: A reference to this MultiBatchPca instance.

◆ set_use_residuals()

MultiBatchPca & scran::MultiBatchPca::set_use_residuals ( bool u = Defaults::use_residuals )

inline

Parameters

u	Whether to compute the rotation vectors from the residuals after centering each batch.

Returns: A reference to this MultiBatchPca instance.

◆ set_block_weight_policy()

MultiBatchPca & scran::MultiBatchPca::set_block_weight_policy ( WeightPolicy w = Defaults::block_weight_policy )

inline

Parameters

w	Policy to use for weighting batches of different size.

Returns: A reference to this MultiBatchPca instance.

◆ set_variable_block_weight_parameters()

MultiBatchPca & scran::MultiBatchPca::set_variable_block_weight_parameters ( VariableBlockWeightParameters v = Defaults::variable_block_weight_parameters )

inline

Parameters

v	Parameters for the variable block weights, see `variable_block_weight()` for more details. Only used when the block weight policy is set to `WeightPolicy::VARIABLE`.

Returns: A reference to this MultiBatchPca instance.

◆ set_return_rotation()

MultiBatchPca & scran::MultiBatchPca::set_return_rotation ( bool r = Defaults::return_rotation )

inline

Parameters

r	Should the rotation matrix be returned in the output?

Returns: A reference to this MultiBatchPca instance.

◆ set_return_center()

MultiBatchPca & scran::MultiBatchPca::set_return_center ( bool r = Defaults::return_center )

inline

Parameters

r	Should the center vector be returned in the output?

Returns: A reference to this MultiBatchPca instance.

◆ set_return_scale()

MultiBatchPca & scran::MultiBatchPca::set_return_scale ( bool r = Defaults::return_scale )

inline

Parameters

r	Should the scale vector be returned in the output?

Returns: A reference to this MultiBatchPca instance.

◆ set_num_threads()

MultiBatchPca & scran::MultiBatchPca::set_num_threads ( int n = Defaults::num_threads )

inline

Parameters

n	Number of threads to use.

Returns: A reference to this MultiBatchPca instance.

◆ run() [1/2]

template<typename T , typename IDX , typename Batch >

Results scran::MultiBatchPca::run	(	const tatami::Matrix< T, IDX > *	mat,
		const Batch *	batch
	)		const

inline

Run the multi-batch PCA on an input gene-by-cell matrix.

Template Parameters

T	Floating point type for the data.
IDX	Integer type for the indices.
Batch	Integer type for the batch assignments.

Parameters

[in]	mat	Pointer to the input matrix. Columns should contain cells while rows should contain genes.
[in]	batch	Pointer to an array of length equal to the number of cells. This should contain a 0-based batch assignment for each cell (i.e., for `n` batches, batch identities should run from 0 to `n-1` with at least one entry for each batch.)

Returns: A Results object containing the PCs and the variance explained.

◆ run() [2/2]

template<typename T , typename IDX , typename Batch , typename X >

Results scran::MultiBatchPca::run	(	const tatami::Matrix< T, IDX > *	mat,
		const Batch *	batch,
		const X *	features
	)		const

inline

Run the multi-batch PCA on an input gene-by-cell matrix after filtering for genes of interest. We typically use the set of highly variable genes from ChooseHVGs, with the aim being to improve computational efficiency and avoid random noise by removing lowly variable genes.

Template Parameters

T	Floating point type for the data.
IDX	Integer type for the indices.
Batch	Integer type for the batch assignments
X	Integer type for the feature filter.

Parameters

[in]	mat	Pointer to the input matrix. Columns should contain cells while rows should contain genes.
[in]	batch	Pointer to an array of length equal to the number of cells. This should contain a 0-based batch assignment for each cell (i.e., for `n` batches, batch identities should run from 0 to `n-1` with at least one entry for each batch.)
[in]	features	Pointer to an array of length equal to the number of genes. Each entry treated as a boolean specifying whether the corresponding genes should be used in the PCA.

Returns: A Results object containing the PCs and the variance explained.

The documentation for this class was generated from the following file:

scran/dimensionality_reduction/MultiBatchPca.hpp

Classes

Public Member Functions

Detailed Description

Member Function Documentation

◆ set_rank()

◆ set_scale()

◆ set_transpose()

◆ set_use_residuals()

◆ set_block_weight_policy()

◆ set_variable_block_weight_parameters()

◆ set_return_rotation()

◆ set_return_center()

◆ set_return_scale()

◆ set_num_threads()

◆ run() [1/2]

◆ run() [2/2]