
Fit Saturation-Magnitude (S-M) Curve Between Reads and UMIs
library_computation.Rd
Fits a nonlinear saturation curve model to estimate the relationship between mapped
reads per cell and observed UMIs per cell. The model accounts for both UMI saturation
at high read depths and PCR amplification variability. This function is used internally
by reference_data_processing
.
Arguments
- QC_data
Data frame. UMI-level molecule information from
obtain_qc_read_umi_table
containing columnsnum_reads
,UMI_id
,cell_id
, andresponse_id
.- downsample_ratio
Numeric or numeric vector. Proportion(s) for downsampling the dataset to create additional observation points. Must be between 0 and 1. Can be a vector for multiple downsampling levels, but one level is often sufficient. Default: 0.7.
- D2_rough
Numeric. Rough prior estimate for the variation parameter (D2) in the S-M curve model. Represents PCR amplification bias. Typically 0.3 for perturb-seq, higher (e.g., 0.8) for TAP-seq. Default: 0.3.
Value
A fitted S-M curve model object of class nlsLM
from the
minpack.lm
package. The model has two fitted parameters accessible via
coef()
:
- total_UMIs
Maximum UMI count per cell at sequencing saturation
- D2
Variation parameter characterizing PCR amplification bias (0 to 1)
Details
Saturation Model
The S-M curve model is:
$$\text{UMI} = \text{total_UMIs} \times \left(1 - \exp\left(-\frac{\text{reads}}{\text{total_UMIs}}\right) \times \left(1 + D2 \times \frac{\text{reads}^2}{2 \times \text{total_UMIs}^2}\right)\right)$$
where:
reads
: Number of mapped reads per cell (independent variable)UMI
: Number of observed UMIs per cell (dependent variable)total_UMIs
: Maximum UMI per cell at saturation (fitted parameter)D2
: Variation parameter for PCR bias, between 0 and 1 (fitted parameter)
Fitting Procedure
Expands read data by replicating UMI indices according to read counts
Downsamples the read data at specified ratio(s) to create multiple observation points
Counts unique UMIs at each downsampled read depth
Fits nonlinear model using two different initial parameter sets:
"Delicate": Uses prior D2_rough and derives initial total_UMIs
"Rough": Uses observed UMI count as initial total_UMIs
Selects model with lower relative prediction error
Warns if relative error exceeds 5\
Important Notes
The toy example data has very few reads, so fitted parameters may be sensitive to random seed and prior specification
In practice with real data, the function demonstrates robustness to both random seed choice and moderate prior misspecification
Multiple downsampling ratios can be provided as a vector for more observation points, but typically one ratio suffices
See also
obtain_qc_read_umi_table
for input data preparation.
reference_data_processing
for the complete preprocessing workflow.
library_estimation
for extracting parameters from the fitted model.
Examples
# Get QC data and compute library parameters
cellranger_path <- system.file("extdata/cellranger_tiny", package = "perturbplan")
qc_data <- obtain_qc_read_umi_table(cellranger_path)
# Fit saturation curve
lib_model <- library_computation(
QC_data = qc_data,
downsample_ratio = 0.7,
D2_rough = 0.3
)
# View fitted parameters
coef(lib_model)
#> total_UMIs D2
#> 7.611742 1.000000
# Extract specific parameters
total_umis <- coef(lib_model)["total_UMIs"]
variation <- coef(lib_model)["D2"]