Compute power analysis for experimental planning (underspecified design)

This function performs power analysis during the experimental planning phase using underspecified information. It accepts aggregate experimental parameters (cells per target, reads per cell) without requiring specific cell count assignments to individual gRNAs or perturbation-gene pairs. It also allows specifying a list of experimental parameters to compare across. This is useful for designing experiments before data collection, allowing you to explore how different experimental designs (cell numbers, sequencing depth) affect overall statistical power.

Usage

compute_power_plan(
  TPM_threshold,
  minimum_fold_change,
  cells_per_target,
  sequenced_reads_per_cell,
  MOI = 10,
  num_targets = 100,
  non_targeting_gRNAs = 10,
  gRNAs_per_target = 4,
  gRNA_variability = 0.13,
  control_group = "complement",
  side = "left",
  multiple_testing_alpha = 0.05,
  prop_non_null = 0.1,
  baseline_expression_stats,
  library_parameters,
  grid_size = 10,
  min_power_threshold = 0.01,
  max_power_threshold = 0.8,
  mapping_efficiency = 0.72
)

Arguments

TPM_threshold: Numeric, numeric vector, or character. TPM threshold value, custom sequence, or "varying" for auto-selection.
minimum_fold_change: Numeric, numeric vector, or character. Minimum fold change value, custom sequence, or "varying" for auto-selection. Pairs with effects at least this large are considered non-null.
cells_per_target: Numeric, numeric vector, or character. Number of cells per target, custom sequence, or "varying" for auto-generated grid.
sequenced_reads_per_cell: Numeric, numeric vector, or character. Sequenced reads per cell (raw sequencer output), custom sequence, or "varying" for auto-generated grid.
MOI: Numeric. Multiplicity of infection (default: 10).
num_targets: Integer. Number of targets (default: 100).
non_targeting_gRNAs: Integer. Number of non-targeting gRNAs (default: 10).
gRNAs_per_target: Integer. Number of gRNAs per target (default: 4).
gRNA_variability: Numeric. Standard deviation for gRNA effect variation (default: 0.13).
control_group: String. Control group type (default: "complement").
side: String. Test sidedness (default: "left").
multiple_testing_alpha: Numeric. FDR level (default: 0.05).
prop_non_null: Numeric. Proportion of non-null hypotheses, i.e., the fraction of tested pairs expected to exhibit an effect at least as large as the specified minimum_fold_change (default: 0.1).
baseline_expression_stats: Data frame. Baseline expression statistics. See reference_data_processing for data format requirements.
library_parameters: List. rSAC_fn_wrapper format from library_estimation. See reference_data_processing for parameter specifications.
grid_size: Integer. Grid size for each dimension (default: 10).
min_power_threshold: Numeric. Minimum power threshold (default: 0.01).
max_power_threshold: Numeric. Maximum power threshold to achieve (default: 0.8).
mapping_efficiency: Numeric. Mapping efficiency for raw reads to usable reads (default: 0.72). See reference_data_processing for typical values.

Value

Data frame with comprehensive power analysis results across parameter combinations.

Details

This function provides comprehensive power analysis by:

Expanding parameter combinations (TPM thresholds, fold changes)
Creating fold change expression data for each combination
Running compute_power_plan_per_grid() for each parameter set
Combining results into a flat dataframe for analysis

Examples

# Define parameter ranges for comprehensive analysis
TPM_threshold <- c(5, 10, 15)
minimum_fold_change <- c(0.7, 0.8, 0.9)
cells_per_target <- c(50, 100, 200)
sequenced_reads_per_cell <- c(10000, 25000, 50000)

# Get pilot data
pilot_data <- get_pilot_data_from_package("K562")

# Run comprehensive power analysis
full_results <- compute_power_plan(
  TPM_threshold = TPM_threshold,
  minimum_fold_change = minimum_fold_change,
  cells_per_target = cells_per_target,
  sequenced_reads_per_cell = sequenced_reads_per_cell,
  baseline_expression_stats = pilot_data$baseline_expression_stats,
  library_parameters = pilot_data$library_parameters,
  MOI = 10,
  num_targets = 100,
  side = "left"
)

# Examine results
dim(full_results)
#> [1] 81  7
head(full_results)
#> # A tibble: 6 × 7
#>   minimum_fold_change TPM_threshold cells_per_target num_captured_cells
#>                 <dbl>         <dbl>            <dbl>              <dbl>
#> 1                 0.7             5               50               512.
#> 2                 0.7             5              100              1025 
#> 3                 0.7             5              200              2050 
#> 4                 0.7             5               50               512.
#> 5                 0.7             5              100              1025 
#> 6                 0.7             5              200              2050 
#> # ℹ 3 more variables: sequenced_reads_per_cell <dbl>, library_size <dbl>,
#> #   overall_power <dbl>
summary(full_results$overall_power)
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#> 0.0000056 0.0031864 0.0222978 0.0969089 0.1202245 0.7212869