Fit Saturation-Magnitude (S-M) Curve Between Reads and UMIs

Fits a nonlinear saturation curve model to estimate the relationship between mapped reads per cell and observed UMIs per cell. The model accounts for both UMI saturation at high read depths and PCR amplification variability. This function is used internally by reference_data_processing.

Usage

library_computation(QC_data, downsample_ratio = 0.7, D2_rough = 0.3)

Arguments

QC_data: Data frame. UMI-level molecule information from obtain_qc_read_umi_table containing columns num_reads, UMI_id, cell_id, and response_id.
downsample_ratio: Numeric or numeric vector. Proportion(s) for downsampling the dataset to create additional observation points. Must be between 0 and 1. Can be a vector for multiple downsampling levels, but one level is often sufficient. Default: 0.7.
D2_rough: Numeric. Rough prior estimate for the variation parameter (D2) in the S-M curve model. Represents PCR amplification bias. Typically 0.3 for perturb-seq, higher (e.g., 0.8) for TAP-seq. Default: 0.3.

Value

A fitted S-M curve model object of class nlsLM from the minpack.lm package. The model has two fitted parameters accessible via coef():

total_UMIs: Maximum UMI count per cell at sequencing saturation
D2: Variation parameter characterizing PCR amplification bias (0 to 1)

Details

Saturation Model

The S-M curve model is:

$$\text{UMI} = \text{total_UMIs} \times \left(1 - \exp\left(-\frac{\text{reads}}{\text{total_UMIs}}\right) \times \left(1 + D2 \times \frac{\text{reads}^2}{2 \times \text{total_UMIs}^2}\right)\right)$$

where:

reads: Number of mapped reads per cell (independent variable)
UMI: Number of observed UMIs per cell (dependent variable)
total_UMIs: Maximum UMI per cell at saturation (fitted parameter)
D2: Variation parameter for PCR bias, between 0 and 1 (fitted parameter)

Fitting Procedure

Expands read data by replicating UMI indices according to read counts
Downsamples the read data at specified ratio(s) to create multiple observation points
Counts unique UMIs at each downsampled read depth
Fits nonlinear model using two different initial parameter sets:
- "Delicate": Uses prior D2_rough and derives initial total_UMIs
- "Rough": Uses observed UMI count as initial total_UMIs
Selects model with lower relative prediction error
Warns if relative error exceeds 5\

Important Notes

The toy example data has very few reads, so fitted parameters may be sensitive to random seed and prior specification
In practice with real data, the function demonstrates robustness to both random seed choice and moderate prior misspecification
Multiple downsampling ratios can be provided as a vector for more observation points, but typically one ratio suffices

Examples

# Get QC data and compute library parameters
cellranger_path <- system.file("extdata/cellranger_tiny", package = "perturbplan")
qc_data <- obtain_qc_read_umi_table(cellranger_path)

# Fit saturation curve
lib_model <- library_computation(
  QC_data = qc_data,
  downsample_ratio = 0.7,
  D2_rough = 0.3
)

# View fitted parameters
coef(lib_model)
#> total_UMIs         D2 
#>   7.611742   1.000000 

# Extract specific parameters
total_umis <- coef(lib_model)["total_UMIs"]
variation <- coef(lib_model)["D2"]