Skip to contents

Design problems

Steps to specify a design problem

1. Assay type

The first step is to choose the assay type: Perturb-seq or TAP-seq. PerturbPlan currently supports the same set of design problems for Perturb-seq and TAP-seq. In particular, PerturbPlan does not currently support TAP-seq-specific design problems, such as choosing which set of genes to target. The choice of assay type primarily impacts the choice of reference expression data and the quantification of gene expression (see below).

2. Optimization constraints

PerturbPlan facilitates constraining the power and cost of an experiment.

  • Power. The target power for an experiment is often set at 80% (the PerturbPlan default), which means that one expects to detect at least 80% of the true perturbation-gene pairs. Power is impacted by several choices; for example, power grows when profiling a larger number of cells or sequencing a larger number of reads per cell.

  • Cost. The total cost is computed as the sum of the library preparation cost and the sequencing cost. How these costs are determined depends on the library preparation and sequencing platforms used. To maintain flexibility across platforms, PerturbPlan quantifies library preparation cost in terms of cost/cell and sequencing cost in terms of cost/million reads. The defaults built into PerturbPlan are based on the cost in 2025 of 10X Genomics v4 3’ library preparation and Novaseq X 25B.

3. Minimization targets

Given power and/or cost constraints, PerturbPlan supports the minimization of five parameters:

  • Cells per target: The average number of cells receiving CRISPR perturbations of a given target. Increasing the number of cells per target increases power, but also increases cost. Minimizing this parameter can be used as a proxy for minimizing cost (often while constraining power).

  • Reads per cell: The average number of sequenced reads per cell. Increasing the number of reads per cell increases power, but also increases cost. Minimizing this parameter can be used as a proxy for minimizing cost (often while constraining power).

  • Expression threshold: Even if the experimentalist has in mind a concrete set of perturbation-gene pairs to test for association, pairs involving lowly expressed genes can drag down the power of the analysis. For this reason, a minimum gene expression threshold can be applied to filter the set of pairs prior to testing. Minimizing this threshold is a means to include as many genes as possible in the analysis, while still achieving the target power.

  • Fold change: The fold change measures the size of the effect of a perturbation on a gene’s expression (see below for a more precise definition). Minimizing the fold change is a means to be able to detect more subtle effects, while still achieving the target power.

  • Total cost: Note that cost can be a constraint (as above) or a variable to be minimized over experimental parameters, subject to a power constraint.

4. Other varying parameters

In some of the design problems, parameters in addition to that being minimized are allowed to vary. These include cells/target and reads/cell; see below for more detail.

Supported design problems

PerturbPlan requires a power constraint for all design problems, and allows choosing whether to include a cost constraint as well. All supported combinations for optimization constraints, minimization targets, and other varying parameters are presented in the table below.

Constraints Minimization target Other varying parameters
1 Power Cells/Target
2 Power Reads/Cell
3 Power Expression threshold
4 Power Fold change
5 Power Total cost Cells/Target + Reads/Cell
6 Power + Cost Expression threshold Cells/Target
7 Power + Cost Expression threshold Reads/Cell
8 Power + Cost Expression threshold Cells/Target + Reads/Cell
9 Power + Cost Fold change Cells/Target
10 Power + Cost Fold change Reads/Cell
11 Power + Cost Fold change Cells/Target + Reads/Cell

Design problems 1-4 are the simplest: they determine the minimum value of one parameter required to attain the desired power, while holding all other parameters fixed. Design problem 5 allows cells/target and reads/cell to vary, finding the lowest-cost combination to achieve a target power. The remainder of the design problems involve constraining both cost and power, allowing the minimization of either expression threshold or fold change, while allowing one or both of cells/target and reads/cell to vary.

Design parameters

Once you have selected your design problem, you need to specify the design parameters that define your screen. These parameters are grouped into four collapsible sections: experimental choices (how the data will be collected), analysis choices (how the data will be analyzed), effect sizes (the strength of the biological effects you wish you detect), and advanced settings (optional additional settings).


Experimental choices

Reference expression data

Reference expression data for the biological system of interest is needed to compute power. These data include baseline mean expressions for all expressed genes (Perturb-seq) or targeted genes (TAP-seq), a fitted sequencing saturation curve, and a mapping efficiency (defined below). Reference expression data can be either built-in or custom:

  • Built-in: PerturbPlan ships with reference expression data from six cell types commonly used in perturbation screens (K562, A549, THP1, CD8+ T, iPSC, iPSC-derived neurons), documented here. Consider using these if you are planning a Perturb-seq experiment and your cell type falls into one of these categories.

  • Custom: Custom reference data can be uploaded to the PerturbPlan app. The required format and the means to process your data into this format are documented here and here. Use custom reference data if you are planning a Perturb-seq experiment in a cell type not covered by the built-in data, or if you are planning a TAP-seq experiment. Since each TAP-seq experiment comes with a unique set of targeted genes, PerturbPlan does not include built-in TAP-seq reference data. If you wish to try out the TAP-seq functionality of the app, you may download a sample reference expression dataset, documented here.

Multiplicity of infection (MOI)

The average number of gRNA-carrying lentiviral particles that infect a single cell.

Number of perturbation targets

The number of genomic elements (commonly, genes or enhancers) you plan to perturb in your screen. Not to be confused with the number of gene targeted for sequencing in a TAP-seq experiment.

gRNAs per target

The number of gRNAs designed to perturb each target. Having multiple gRNAs per target improves robustness against ineffective guides and improves power by averaging noisy gRNA effects.

Number of non-targeting gRNAs

Non-targeting gRNAs are control gRNAs that do not target any genomic element. They are used to form a control group of cells (in low-MOI screens) and to check for calibration (in all screens). While the effect of the number of non-targeting gRNAs on power is usually small, their utility is for other purposes. It is recommended to include tens or even hundreds of non-targeting gRNAs in perturbation screens.

Cells per target

The average number of cells receiving CRISPR perturbations of a given target, equal to the total number of cells profiled divided by the number of perturbation targets.

Reads per cell:

The average number of sequenced reads per cell. More sequenced reads gives rise to more unique molecular identifiers (UMIs) per cell, which leads to less noisy gene expressions and therefore higher power.

Analysis Choices

Perturbation-gene pairs to analyze

The set of perturbation-gene pairs you plan to test for association. These pairs can be specified in one of two ways:

  • Random pairing: If you have not yet decided which perturbation-gene pairs to analyze, you can choose to pair perturbations to genes at random, subject to the constraint that the genes involved survive the expression threshold.

  • Custom pairing: If you have decided which perturbation-gene pairs to analyze, you can upload this set of pairs to PerturbPlan as an RDS file containing a data frame with two columns named grna_target (an identifier of the genomic element being targeted) and response_id (the ENSG ID of the gene).

Test sidedness

The expected direction of the perturbation effect. The options are

  • Left: If perturbing the genomic element is expected to down-regulate the gene it is tested against.

  • Right: If perturbing the genomic element is expected to up-regulate the gene it is tested against.

  • Both: If the direction of the effect is unknown.

Expression threshold

The relevant quantification of gene expression depends on whether a Perturb-seq or TAP-seq is being planned:

  • Perturb-seq: Expressions obtained from transcriptome-wide RNA-seq are commonly measured in the units of transcripts per million (TPM), the expected number of UMIs coming from a given gene among a million UMIs captured in a cell.

  • TAP-seq: Measuring expression in terms of TPMs is less meaningful in a TAP-seq experiment, where expression is measured for a smaller number of genes. As an interpretable substitute for TPM, we quantify TAP-seq gene expression in terms of the UMIs/cell at saturation, the expected number of UMIs coming from a given gene in a cell if sequenced to saturation.

Effect sizes

The following two parameters are relevant for defining the biological effects you wish to detect.

Fold change

The fold change is the multiplicative change in mean gene expression induced by the perturbation:

  • Fold change = 1.0: No effect (null hypothesis).

  • Fold change < 1.0: Negative/inhibitory effect (e.g., 0.8 = 20% decrease in expression).

  • Fold change > 1.0: Positive/activating effect (e.g., 1.5 = 50% increase in expression).

For a conservative power estimate, specify the weakest effect size of interest.

Proportion of non-null pairs

The fraction of tested perturbation-gene pairs expected to have signal at least a strong as the weakest effect size of interest. Higher proportions of non-null pairs increase power by reducing the impact of multiple testing corrections. Estimating this parameter for perturbation screens remains an open question, but the default value of 0.01 is a reasonable starting point.

Advanced settings

gRNA variability

The variance of the per-gRNA fold changes for each target. If the fold change for a given perturbation-gene pair is β\beta and the gRNA variability is σ2\sigma^2, then the fold change induced by a gRNA is modeled as N(β,σ2)N(\beta, \sigma^2).

Mapping efficiency

The fraction of sequencing reads that map confidently to the genes of interest. This quantity is derived from the reference expression data, but may be changed manually if desired. The mapping efficiency depends on the type of experiment being planned:

  • Perturb-seq: In Perturb-seq, all genes are of interest, so mapping efficiency is the proportion of reads mapping to any gene. Perturb-seq mapping efficiencies often fall in the range 0.65-0.75.

  • TAP-seq: Imperfect gene-specific primers can hybridize to transcripts coming from genes other than those targeted. Therefore, a proportion of reads confidently mapped to the genome do not map to the target genes, making TAP-seq mapping efficiencies generally lower than those of Perturb-seq. For example, the mapping efficiency of the recent Ray et al. (2025) TAP-seq dataset was 0.35.

Control group

The strategy used to construct control cells for testing perturbation effects.

  • Non-targeting cells: Cells that receive only non-targeting gRNAs. This is the default choice for low-MOI screens (MOI > 1).

  • Complement cells: Cells that do not receive the perturbation being tested, but may receive other perturbations. This is the required choice for high-MOI experiments (MOI > 1), since few cells will have exclusively non-targeting gRNAs.

For more on this choice, see the sceptre documentation.

FDR target level

The maximum tolerated expected proportion of discovered associations that are false positives. The FDR target level of 0.1 is the most common choice.