Skip to contents

Extracts QC-filtered UMI-level molecule information from Cell Ranger HDF5 files. This function is used internally by reference_data_preprocessing_10x.

Usage

obtain_qc_read_umi_table(path_to_cellranger_output)

Arguments

path_to_cellranger_output

Character. Path to Cell Ranger run folder containing:

  • outs/molecule_info.h5 – Raw molecule information

  • outs/filtered_feature_bc_matrix.h5 – QC-filtered cell barcodes

Value

Data frame with UMI-level molecule information containing columns:

num_reads

Number of reads supporting this UMI-cell combination

UMI_id

UMI index (1-based)

cell_id

Cell barcode with GEM group suffix (e.g., "ACGTACGT-1")

response_id

Gene identifier (e.g., Ensembl ID)

Details

The function:

  1. Reads raw molecule information from molecule_info.h5

  2. Reads QC-filtered cell barcodes from filtered_feature_bc_matrix.h5

  3. Filters molecule data to retain only QC-passed cells

  4. Constructs cell IDs with GEM group suffixes

  5. Returns data frame with read counts per UMI per cell

This data is used for fitting the library saturation (S-M) curve in library_computation.

See also

reference_data_preprocessing_10x for aggregating data from multiple runs.

library_computation for fitting saturation curves using this data.

Examples

# Extract read/UMI information from Cell Ranger output
cellranger_path <- system.file("extdata/cellranger_tiny", package = "perturbplan")
qc_table <- obtain_qc_read_umi_table(cellranger_path)

# Examine the data
head(qc_table)
#>   num_reads UMI_id            cell_id     response_id
#> 1         2 139105 AAACCTGGTATATGAG-1 ENSG00000241860
#> 2         1 723247 AAACGGGTCAGCTCGG-1 ENSG00000238009
#> 3         1 998389 AAAGTAGCATCCCACT-1 ENSG00000239945
#> 4         2 622094 AAAGTAGTCCAAATGC-1 ENSG00000286448
#> 5         1 584568 AGCAGCCGTCCAAGTT-1 ENSG00000243485
#> 6         1 956290 AGCGGTCCATTCCTGC-1 ENSG00000238009
dim(qc_table)
#> [1] 11  4
summary(qc_table$num_reads)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   1.000   1.000   1.000   1.182   1.000   2.000