Skip to contents

This function performs power analysis during the experimental planning phase using underspecified information. It accepts aggregate experimental parameters (cells per target, reads per cell) without requiring specific cell count assignments to individual gRNAs or perturbation-gene pairs. It also allows specifying a list of experimental parameters to compare across. This is useful for designing experiments before data collection, allowing you to explore how different experimental designs (cell numbers, sequencing depth) affect overall statistical power.

Usage

compute_power_plan(
  TPM_threshold,
  minimum_fold_change,
  cells_per_target,
  sequenced_reads_per_cell,
  MOI = 10,
  num_targets = 100,
  non_targeting_gRNAs = 10,
  gRNAs_per_target = 4,
  gRNA_variability = 0.13,
  control_group = "complement",
  side = "left",
  multiple_testing_alpha = 0.05,
  prop_non_null = 0.1,
  baseline_expression_stats,
  library_parameters,
  grid_size = 10,
  min_power_threshold = 0.01,
  max_power_threshold = 0.8,
  mapping_efficiency = 0.72
)

Arguments

TPM_threshold

Numeric, numeric vector, or character. TPM threshold value, custom sequence, or "varying" for auto-selection.

minimum_fold_change

Numeric, numeric vector, or character. Minimum fold change value, custom sequence, or "varying" for auto-selection. Pairs with effects at least this large are considered non-null.

cells_per_target

Numeric, numeric vector, or character. Number of cells per target, custom sequence, or "varying" for auto-generated grid.

sequenced_reads_per_cell

Numeric, numeric vector, or character. Sequenced reads per cell (raw sequencer output), custom sequence, or "varying" for auto-generated grid.

MOI

Numeric. Multiplicity of infection (default: 10).

num_targets

Integer. Number of targets (default: 100).

non_targeting_gRNAs

Integer. Number of non-targeting gRNAs (default: 10).

gRNAs_per_target

Integer. Number of gRNAs per target (default: 4).

gRNA_variability

Numeric. Standard deviation for gRNA effect variation (default: 0.13).

control_group

String. Control group type (default: "complement").

side

String. Test sidedness (default: "left").

multiple_testing_alpha

Numeric. FDR level (default: 0.05).

prop_non_null

Numeric. Proportion of non-null hypotheses, i.e., the fraction of tested pairs expected to exhibit an effect at least as large as the specified minimum_fold_change (default: 0.1).

baseline_expression_stats

Data frame. Baseline expression statistics. See reference_data_processing for data format requirements.

library_parameters

List. Library parameters with UMI_per_cell and variation. See reference_data_processing for parameter specifications.

grid_size

Integer. Grid size for each dimension (default: 10).

min_power_threshold

Numeric. Minimum power threshold (default: 0.01).

max_power_threshold

Numeric. Maximum power threshold to achieve (default: 0.8).

mapping_efficiency

Numeric. Mapping efficiency for raw reads to usable reads (default: 0.72). See reference_data_processing for typical values.

Value

Data frame with comprehensive power analysis results across parameter combinations.

Details

This function provides comprehensive power analysis by:

  1. Expanding parameter combinations (TPM thresholds, fold changes)

  2. Creating fold change expression data for each combination

  3. Running compute_power_plan_per_grid() for each parameter set

  4. Combining results into a flat dataframe for analysis

Examples

# Define parameter ranges for comprehensive analysis
TPM_threshold <- c(5, 10, 15)
minimum_fold_change <- c(0.7, 0.8, 0.9)
cells_per_target <- c(50, 100, 200)
sequenced_reads_per_cell <- c(10000, 25000, 50000)

# Get pilot data
pilot_data <- get_pilot_data_from_package("K562")

# Run comprehensive power analysis
full_results <- compute_power_plan(
  TPM_threshold = TPM_threshold,
  minimum_fold_change = minimum_fold_change,
  cells_per_target = cells_per_target,
  sequenced_reads_per_cell = sequenced_reads_per_cell,
  baseline_expression_stats = pilot_data$baseline_expression_stats,
  library_parameters = pilot_data$library_parameters,
  MOI = 10,
  num_targets = 100,
  side = "left"
)

# Examine results
dim(full_results)
#> [1] 81  7
head(full_results)
#> # A tibble: 6 × 7
#>   minimum_fold_change TPM_threshold cells_per_target num_captured_cells
#>                 <dbl>         <dbl>            <dbl>              <dbl>
#> 1                 0.7             5               50               512.
#> 2                 0.7             5              100              1025 
#> 3                 0.7             5              200              2050 
#> 4                 0.7             5               50               512.
#> 5                 0.7             5              100              1025 
#> 6                 0.7             5              200              2050 
#> # ℹ 3 more variables: sequenced_reads_per_cell <dbl>, library_size <dbl>,
#> #   overall_power <dbl>
summary(full_results$overall_power)
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#> 0.0000087 0.0035579 0.0244641 0.0968134 0.1208887 0.7185824