scran
C++ library for basic single-cell RNA-seq analyses
|
Create filters to identify low-quality cells from CRISPR-derived QC metrics. More...
#include <SuggestCrisprQcFilters.hpp>
Classes | |
struct | Defaults |
Default parameters. More... | |
struct | Thresholds |
Thresholds to define outliers on each metric. More... | |
Public Member Functions | |
SuggestCrisprQcFilters & | set_num_mads (double n=Defaults::num_mads) |
template<typename Float , typename Integer > | |
Thresholds | run (size_t n, const PerCellCrisprQcMetrics::Buffers< Float, Integer > &buffers) const |
Thresholds | run (const PerCellCrisprQcMetrics::Results &metrics) const |
template<typename Block , typename Float , typename Integer > | |
Thresholds | run_blocked (size_t n, const Block *block, const PerCellCrisprQcMetrics::Buffers< Float, Integer > &buffers) const |
template<typename Block > | |
Thresholds | run_blocked (const PerCellCrisprQcMetrics::Results &metrics, const Block *block) const |
Create filters to identify low-quality cells from CRISPR-derived QC metrics.
In CRISPR guide count matrices, the QC filtering decisions are somewhat different than those for the other modalities. Here, low-quality cells are defined as those with:
Directly defining a threshold on the maximum count is somewhat tricky as unsuccessful transfection is not uncommon. This often results in a large subpopulation with low maximum counts, inflating the MAD and compromising the threshold calculation. Instead, we use the following approach:
This assumes that over 50% of cells were successfully transfected with a single guide construct and have high maximum proportions. In contrast, unsuccessful transfections will be dominated by ambient contamination and have low proportions. By taking the subset above the median proportion, we remove all of the unsuccessful transfections and enrich for mostly-high-quality cells. From there, we can apply the usual outlier detection methods on the maximum count, with log-transformation to avoid a negative threshold.
Keep in mind that the maximum proportion is only used to define the subset for threshold calculation. Once the maximum count threshold is computed, they are applied to all cells, regardless of their maximum proportions. This allows us to recover good cells that would have been filtered out by our aggressive median subset. It also ensures that we do not remove cells transfected with multiple guides - such cells are not necessarily uninteresting, e.g., for examining interaction effects, so we will err on the side of caution and leave them in.
For datasets with multiple blocks, SuggestCrisprQcFilters::run_blocked()
will compute block-specific thresholds for the maximum count. See comments in SuggestRnaQcFilters
for more details.
|
inline |
n | Number of MADs below the median, to define the threshold for outliers in the maximum count. This should be non-negative. |
SuggestCrisprQcFilters
object.
|
inline |
Float | Floating point type for the metrics. |
Integer | Integer for the metrics. |
n | Number of cells. | |
[in] | buffers | Pointers to arrays of length n , containing the per-cell CRISPR-derived metrics. |
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
metrics | Collection of arrays of length equal to the number of cells, containing the per-cell CRISPR-derived metrics. |
|
inline |
Block | Integer type for the block assignments. |
Float | Floating point type for the metrics. |
Integer | Integer for the metrics. |
n | Number of cells. | |
[in] | block | Pointer to an array of length n , containing the block assignments for each cell. This may be NULL , in which case all cells are assumed to belong to the same block. |
[in] | buffers | Pointers to arrays of length n , containing the per-cell CRISPR-derived metrics. Only max_proportion and sums are used; detected is ignored and does not need to be set. |
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Block | Integer type for the block assignments. |
metrics | Collection of arrays of length equal to the number of cells, containing the per-cell CRISPR-derived metrics. Only max_proportion and sums are used; detected is ignored and does not need to be set. | |
[in] | block | Pointer to an array of length equal to the number of cells, containing the block assignments for each cell. This may be NULL , in which case all cells are assumed to belong to the same block. |