
Extract UMI-Level Molecule Information from Cell Ranger HDF5 Files
obtain_qc_read_umi_table.Rd
Extracts QC-filtered UMI-level molecule information from Cell Ranger HDF5 files.
This function is used internally by reference_data_preprocessing_10x
.
Arguments
- path_to_cellranger_output
Character. Path to Cell Ranger run folder containing:
outs/molecule_info.h5
– Raw molecule informationouts/filtered_feature_bc_matrix.h5
– QC-filtered cell barcodes
Value
Data frame with UMI-level molecule information containing columns:
- num_reads
Number of reads supporting this UMI-cell combination
- UMI_id
UMI index (1-based)
- cell_id
Cell barcode with GEM group suffix (e.g., "ACGTACGT-1")
- response_id
Gene identifier (e.g., Ensembl ID)
Details
The function:
Reads raw molecule information from
molecule_info.h5
Reads QC-filtered cell barcodes from
filtered_feature_bc_matrix.h5
Filters molecule data to retain only QC-passed cells
Constructs cell IDs with GEM group suffixes
Returns data frame with read counts per UMI per cell
This data is used for fitting the library saturation (S-M) curve in
library_computation
.
See also
reference_data_preprocessing_10x
for aggregating data from multiple runs.
library_computation
for fitting saturation curves using this data.
Examples
# Extract read/UMI information from Cell Ranger output
cellranger_path <- system.file("extdata/cellranger_tiny", package = "perturbplan")
qc_table <- obtain_qc_read_umi_table(cellranger_path)
# Examine the data
head(qc_table)
#> num_reads UMI_id cell_id response_id
#> 1 2 139105 AAACCTGGTATATGAG-1 ENSG00000241860
#> 2 1 723247 AAACGGGTCAGCTCGG-1 ENSG00000238009
#> 3 1 998389 AAAGTAGCATCCCACT-1 ENSG00000239945
#> 4 2 622094 AAAGTAGTCCAAATGC-1 ENSG00000286448
#> 5 1 584568 AGCAGCCGTCCAAGTT-1 ENSG00000243485
#> 6 1 956290 AGCGGTCCATTCCTGC-1 ENSG00000238009
dim(qc_table)
#> [1] 11 4
summary(qc_table$num_reads)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 1.000 1.000 1.000 1.182 1.000 2.000