Cost-Constrained Power Analysis for Perturb-seq Experiments — cost_power

Performs comprehensive power analysis across experimental design space with optional cost constraints for perturb-seq experiments. Computes power across parameter grids and applies filtering based on power targets and budget constraints.

Usage

cost_power_computation(
  minimizing_variable = "TPM_threshold",
  fixed_variable = list(minimum_fold_change = 0.8),
  MOI = 10,
  num_targets = 100,
  non_targeting_gRNAs = 10,
  gRNAs_per_target = 4,
  gRNA_variability = 0.13,
  control_group = "complement",
  side = "left",
  multiple_testing_alpha = 0.05,
  prop_non_null = 0.1,
  baseline_expression_stats,
  library_parameters,
  grid_size = 20,
  power_target = 0.8,
  power_precision = 0.01,
  min_power = 0.05,
  max_power = 0.95,
  power_range = 0.6,
  cost_precision = 0.9,
  cost_per_captured_cell = 0.086,
  cost_per_million_reads = 0.374,
  cost_constraint = NULL,
  mapping_efficiency = 0.72
)

Arguments

minimizing_variable

Character. The parameter to vary during analysis. Options: "TPM_threshold" or "minimum_fold_change". Default: "TPM_threshold".

fixed_variable

List. Fixed values for other analysis parameters. Can include:

minimum_fold_change: Fixed fold change threshold (when varying TPM_threshold)
TPM_threshold: Fixed TPM threshold (when varying minimum_fold_change)
cells_per_target: Fixed cells per target (otherwise uses "varying")
reads_per_cell: Fixed reads per cell (otherwise uses "varying")

MOI

Numeric. Multiplicity of infection (default: 10).

num_targets

Integer. Number of targets (default: 100).

non_targeting_gRNAs

Integer. Number of non-targeting gRNAs (default: 10).

gRNAs_per_target

Integer. Number of gRNAs per target (default: 4).

gRNA_variability

Numeric. gRNA variability parameter (default: 0.13).

control_group

Character. Control group type: "complement" or "nt_cells" (default: "complement").

side

Character. Test side: "left", "right", or "both" (default: "left").

multiple_testing_alpha

Numeric. Multiple testing significance level (default: 0.05).

prop_non_null

Numeric. Proportion of non-null hypotheses (default: 0.1).

baseline_expression_stats

Data frame. Baseline expression statistics with columns: response_id, relative_expression, expression_size.

library_parameters

List. rSAC_fn_wrapper format from library_estimation containing method_used, UMI_per_cell_at_saturation, reads_norm, n_cells, and method-specific parameters.

grid_size

Integer. Grid size for parameter search (default: 20).

power_target

Numeric. Target statistical power (default: 0.8).

power_precision

Numeric. Acceptable precision around power target (default: 0.01).

min_power

Numeric. Minimum power threshold for grid search (default: 0.05).

max_power

Numeric. Maximum power threshold for grid search (default: 0.95).

power_range

Numeric. Range around power target to search for designs (default: 0.6). The grid search will explore designs with power between power_target - power_range/2 and power_target + power_range/2, constrained by min_power and max_power.

cost_precision

Numeric. Cost utilization factor (default: 0.9). Filters designs with total cost \le cost_precision × cost_constraint.

cost_per_captured_cell

Numeric. Cost per captured cell in dollars (default: 0.086).

cost_per_million_reads

Numeric. Cost per million sequencing reads in dollars (default: 0.374).

cost_constraint

Numeric. Maximum budget constraint in dollars (default: NULL). Set to NULL to disable cost constraints.

mapping_efficiency

Numeric. Sequencing mapping efficiency (default: 0.72).

Value

Data frame with power analysis results including:

Analysis parameters (TPM_threshold, minimum_fold_change, etc.)
Experimental design (cells_per_target, num_captured_cells, sequenced_reads_per_cell)
Power metrics (overall_power)
Cost breakdown (library_cost, sequencing_cost, total_cost)

Details

This function performs comprehensive power analysis by:

Setting up parameter grids based on the minimizing variable
Computing power across experimental design space
Calculating costs for each design
Applying validation checks via check_power_results()

Cost Model:

Total cost is calculated as the sum of library preparation and sequencing costs:

Total Cost = Library Cost + Sequencing Cost

Where:

Library Cost = cost_per_captured_cell * num_captured_cells
Sequencing Cost = cost_per_million_reads * (sequenced_reads_per_cell * num_captured_cells) / 1,000,000

Parameter grid generation:

TPM_threshold: Uses quantiles of baseline expression (10th to 99th percentile)
minimum_fold_change: Uses ranges based on test side (left: 0.5-0.9, right: 1-10, both: combined)