Skip to contents

Fits a nonlinear saturation curve model to estimate the relationship between mapped reads per cell and observed UMIs per cell. The model accounts for both UMI saturation at high read depths and PCR amplification variability. This function is used internally by reference_data_processing.

Usage

library_computation(QC_data, downsample_ratio = 0.7, D2_rough = 0.3)

Arguments

QC_data

Data frame. UMI-level molecule information from obtain_qc_read_umi_table containing columns num_reads, UMI_id, cell_id, and response_id.

downsample_ratio

Numeric or numeric vector. Proportion(s) for downsampling the dataset to create additional observation points. Must be between 0 and 1. Can be a vector for multiple downsampling levels, but one level is often sufficient. Default: 0.7.

D2_rough

Numeric. Rough prior estimate for the variation parameter (D2) in the S-M curve model. Represents PCR amplification bias. Typically 0.3 for perturb-seq, higher (e.g., 0.8) for TAP-seq. Default: 0.3.

Value

A fitted S-M curve model object of class nlsLM from the minpack.lm package. The model has two fitted parameters accessible via coef():

total_UMIs

Maximum UMI count per cell at sequencing saturation

D2

Variation parameter characterizing PCR amplification bias (0 to 1)

Details

Saturation Model

The S-M curve model is:

$$\text{UMI} = \text{total_UMIs} \times \left(1 - \exp\left(-\frac{\text{reads}}{\text{total_UMIs}}\right) \times \left(1 + D2 \times \frac{\text{reads}^2}{2 \times \text{total_UMIs}^2}\right)\right)$$

where:

  • reads: Number of mapped reads per cell (independent variable)

  • UMI: Number of observed UMIs per cell (dependent variable)

  • total_UMIs: Maximum UMI per cell at saturation (fitted parameter)

  • D2: Variation parameter for PCR bias, between 0 and 1 (fitted parameter)

Fitting Procedure

  1. Expands read data by replicating UMI indices according to read counts

  2. Downsamples the read data at specified ratio(s) to create multiple observation points

  3. Counts unique UMIs at each downsampled read depth

  4. Fits nonlinear model using two different initial parameter sets:

    • "Delicate": Uses prior D2_rough and derives initial total_UMIs

    • "Rough": Uses observed UMI count as initial total_UMIs

  5. Selects model with lower relative prediction error

  6. Warns if relative error exceeds 5\

Important Notes

  • The toy example data has very few reads, so fitted parameters may be sensitive to random seed and prior specification

  • In practice with real data, the function demonstrates robustness to both random seed choice and moderate prior misspecification

  • Multiple downsampling ratios can be provided as a vector for more observation points, but typically one ratio suffices

See also

obtain_qc_read_umi_table for input data preparation.

reference_data_processing for the complete preprocessing workflow.

library_estimation for extracting parameters from the fitted model.

Examples

# Get QC data and compute library parameters
cellranger_path <- system.file("extdata/cellranger_tiny", package = "perturbplan")
qc_data <- obtain_qc_read_umi_table(cellranger_path)

# Fit saturation curve
lib_model <- library_computation(
  QC_data = qc_data,
  downsample_ratio = 0.7,
  D2_rough = 0.3
)

# View fitted parameters
coef(lib_model)
#> total_UMIs         D2 
#>   7.611742   1.000000 

# Extract specific parameters
total_umis <- coef(lib_model)["total_UMIs"]
variation <- coef(lib_model)["D2"]