Calculate Naive Mapping Efficiency from Cell Ranger Metrics

Computes the naive mapping efficiency as the proportion of total reads that map to the transcriptome. This function is used internally by reference_data_preprocessing_10x.

Note: This function only supports Cell Ranger count output format, not Cell Ranger multi.

Usage

obtain_mapping_efficiency(QC_data, path_to_cellranger_output)

Arguments

QC_data: Data frame. Output of obtain_qc_read_umi_table containing a num_reads column with read counts per UMI.
path_to_cellranger_output: Character. Path to Cell Ranger run folder containing outs/metrics_summary.csv with a "Number of Reads" column.

Value

Numeric value between 0 and 1 representing the proportion of total reads that successfully mapped to the transcriptome.

Details

The function calculates:

$$\text{mapping_efficiency} = \frac{\text{mapped_reads}}{\text{total_reads}}$$

where:

mapped_reads = sum of num_reads from QC_data
total_reads = "Number of Reads" from metrics_summary.csv

Important Notes

Only Cell Ranger count format is supported. Cell Ranger multi uses a different metrics_summary.csv format (row-based with "Library Type" and "Metric Name" columns) and is not compatible with this function
The metrics_summary.csv file must contain a column named "Number of Reads" (Cell Ranger count format where metric names are column headers)
The function removes commas from the "Number of Reads" field before conversion
This gives a "naive" estimate that will be adjusted in reference_data_processing when a gene list is specified

Examples