Getting Started with CASCADE

Introduction

CASCADE classifies the cell type specificity of genes, peaks, and variants from multi-cell-type QTL analyses, and assigns each variant a regulatory mechanism (the chromatin-to-expression cascade). It distinguishes genuine cell type-specific effects from technical artifacts caused by power differences across cell types using a power-aware LFSR-based assessment, with a hierarchy-derived 6-category specificity system and an 8-category mechanism taxonomy.

Configuration

Use create_config() to build a configuration object that tells CASCADE where your data lives and how to run the analysis. Path templates may include {CELL_TYPE} and {CHR} placeholders that the loader substitutes per cell type and per chromosome.

library(cascade)

config <- create_config(
  cell_types = c("Mono", "DC", "NK", "B", "CD4_T", "CD8_T", "other_T"),
  chromosomes = paste0("chr", 1:22),
  file_patterns = list(
    eqtl_acat        = "data/eqtl/{CELL_TYPE}.acat.tsv.gz",
    caqtl_acat       = "data/caqtl/{CELL_TYPE}.acat.tsv.gz",
    eqtl_susie       = "data/eqtl/{CELL_TYPE}.{CHR}.susie.tsv.gz",
    caqtl_susie      = "data/caqtl/{CELL_TYPE}.{CHR}.susie.tsv.gz",
    eqtl_lfsr        = "data/eqtl.lfsr.tsv.gz",
    caqtl_lfsr       = "data/caqtl.lfsr.tsv.gz",
    eqtl_meta        = "data/eqtl.meta.tsv.gz",
    caqtl_meta       = "data/caqtl.meta.tsv.gz",
    peak_bed         = "data/{CELL_TYPE}.peaks.bed",
    peak_gene_links  = "data/{CELL_TYPE}.peak_gene_links.tsv.gz"
  ),
  parameters = list(
    pip_threshold       = 0.5,
    min_pip_threshold   = 0.1,
    acat_fdr_threshold  = 0.05,
    lfsr_sig_threshold  = 0.05,
    lfsr_null_threshold = 0.5,
    n_cores             = 4
  )
)

Alternatively, store the same structure as JSON and pass the file path directly to run_cascade():

results <- run_cascade("config.json", output_dir = "results/")

File patterns

Key	Description
`eqtl_acat` / `caqtl_acat`	ACAT q-values per cell type
`eqtl_susie` / `caqtl_susie`	SuSiE credible sets per cell type and chromosome
`eqtl_lfsr` / `caqtl_lfsr`	Mash LFSR matrices (rows = feature×variant, columns = cell types)
`eqtl_meta` / `caqtl_meta`	Meta-analysis results with Cochran’s Q heterogeneity p-values
`peak_bed`	BED of peak coordinates per cell type
`peak_gene_links`	Per-cell-type peak-to-gene link table
`eqtl_mashr` / `caqtl_mashr`	(Optional) mashr model RDS files
`cs_clusters` / `cs_cluster_variants`	(Optional) credible-set cluster annotations

Cell Type Hierarchy

CASCADE uses a hierarchy to decide what counts as “shared” versus “specific”. The package ships with DEFAULT_CELL_HIERARCHY for immune cells:

DEFAULT_CELL_HIERARCHY
#> CellTypeHierarchy
#>   Lineages (2): myeloid, lymphoid
#>   L1 cell types (7): Mono, DC, NK, B, CD4_T, CD8_T, other_T
#>   Subgroup levels: 1
#>     Level 1: T-cell
#>   Bulk: PBMC
#>   Other: other
#>   Mapping to L1: 25 entries
#>   Categories (6): Cross-lineage shared | Likely shared but underpowered |
#>     Lineage-specific | T-cell-specific | Single cell-type | No significance

The six categories are auto-generated from the hierarchy structure: 4 fixed categories plus one per grouping level (1 lineage + N subgroup levels).

Running the Analysis

results <- run_cascade(config, output_dir = "results/")

CASCADE always processes all three feature types (gene, peak, variant) and writes the following outputs:

File	Contents
`gene_categorization.tsv.gz`	Per-gene specificity + heterogeneity columns
`gene_categorization.summary.tsv`	Gene category counts
`gene_categorization.top_variants.tsv.gz`	Top variants driving each gene
`peak_categorization.tsv.gz`	Per-peak specificity + heterogeneity columns
`peak_categorization.summary.tsv`	Peak category counts
`peak_categorization.top_variants.tsv.gz`	Top variants driving each peak
`variant_categorization.tsv.gz`	Per-variant Stage 2 cross-cell-type aggregate
`variant_categorization.qtl_mechanism.summary.tsv`	Variant counts per QTL mechanism category
`variant_categorization.cell_type_specificity.summary.tsv`	Variant counts per cell type specificity category
`variant_categorization_per_celltype_{CELL_TYPE}.tsv.gz`	Per-variant Stage 1 output (one file per cell type)

Read results back in R:

genes <- data.table::fread("results/gene_categorization.tsv.gz")
peaks <- data.table::fread("results/peak_categorization.tsv.gz")
variants <- data.table::fread("results/variant_categorization.tsv.gz")

Understanding Results

Gene and peak results

The key column is cell_type_specificity, which takes one of the six hierarchy-derived values:

Category	Meaning
Cross-lineage shared	Significant in 2+ lineages (e.g., both myeloid and lymphoid)
Likely shared but underpowered	LFSR gray-zone evidence suggests hidden sharing
Lineage-specific	Significant in exactly one lineage
T-cell-specific	Significant within the T-cell subgroup only
Single cell-type	Significant in exactly one L1 cell type
No significance	Not significant in any cell type

Companion columns include significant_cts / n_significant_cts (which L1 cell types reached the ACAT q-value threshold), tested_cts / n_tested_cts, and — when variant analysis runs — top_variant, max_pip, max_chisq, and variant_heterogeneity.

Variant results

Variant categorization is two-level. The cross-cell-type table contains:

qtl_mechanism_category: One of 8 QTL mechanism categories — Local Cascade, Positional Cascade, Distal Cascade, caQTL + eQTL (No Link), Only caQTL (With Link), Only caQTL (No Link), Only eQTL, or No molQTL. The three cascade tiers are distinguished by where the chromatin accessibility bridge sits relative to the variant.
cell_type_specificity / gene_cell_type_specificity / peak_cell_type_specificity: three specificity assessments. The combined cell_type_specificity is mechanism-based (no LFSR demotion); the gene- and peak-specific columns apply LFSR.
qtl_pattern_number / qtl_pattern: fine-grained pattern (1–25) with a detailed mechanistic interpretation.

Variant heterogeneity

When a feature is active in multiple cell types, CASCADE assesses effect heterogeneity using Cochran’s Q test. The VARIANT_HETEROGENEITY codes are:

Code	Label	Meaning
a	shared_consistent	Same variant, consistent effect sizes
b	shared_heterogeneous	Same variant, different magnitudes
c	shared_opposite	Same variant, opposite effect directions
d	distinct_variants	Different causal variants across cell types

Column Mapping

CASCADE expects specific column names in input files. If your data uses different names, override them via column_mapping in create_config(). Defaults are defined in DEFAULT_COLUMN_MAPPING.

config <- create_config(
  cell_types = c("Mono", "DC", "NK", "B", "CD4_T", "CD8_T", "other_T"),
  chromosomes = paste0("chr", 1:22),
  file_patterns = list(...),
  column_mapping = list(
    eqtl_susie = list(
      variant_id = "snp_id", # your column for variant IDs
      feature_id = "gene", # your column for gene IDs
      pip        = "posterior_prob" # your column for PIP values
    ),
    meta = list(
      variant_id = "SNP",
      feature_id = "gene_id",
      cochran_q_nlog10p = "het_pval_nlog10"
    )
  )
)

The column_mapping slot uses unprefixed keys for shared schemas (peak_gene, lfsr, meta); only file_patterns distinguishes eQTL vs caQTL with eqtl_/caqtl_ prefixes. Only the file types and columns you need to override must be specified; all others keep their defaults.

Advanced: Custom Hierarchy

To use CASCADE with a different tissue, define your own hierarchy with create_cell_hierarchy(). Here is an example for brain cell types:

brain <- create_cell_hierarchy(
  lineages = list(
    neuronal = c("ExN_L23", "ExN_L4", "ExN_L56", "InN_PV", "InN_SST", "InN_VIP"),
    glial    = c("Astro", "Oligo", "OPC", "Micro"),
    vascular = c("Endo", "Pericyte")
  ),
  subgroups = list(
    list(
      excitatory = c("ExN_L23", "ExN_L4", "ExN_L56"),
      inhibitory = c("InN_PV", "InN_SST", "InN_VIP")
    )
  ),
  bulk = NULL,
  other = NULL,
  mapping_to_l1 = list(
    ExN_L23_CUX2 = "ExN_L23",
    ExN_L4_RORB  = "ExN_L4",
    InN_PV_PVALB = "InN_PV"
  ),
  column_prefix = "cell_type"
)

brain
#> CellTypeHierarchy
#>   Lineages (3): neuronal, glial, vascular
#>   L1 cell types (12): ExN_L23, ExN_L4, ExN_L56, InN_PV, InN_SST, InN_VIP,
#>     Astro, Oligo, OPC, Micro, Endo, Pericyte
#>   Subgroup levels: 1
#>     Level 1: excitatory, inhibitory
#>   Categories (6): Cross-lineage shared | Likely shared but underpowered |
#>     Lineage-specific | Subgroup-specific | Single cell-type | No significance

Key points:

lineages: Must have 2+ groups. These define the top-level split.
subgroups: Optional list of grouping levels, ordered broadest to narrowest. Each group must be a subset of a single lineage. Levels with multiple subgroups produce a generic Subgroup-specific category; levels with a single subgroup use that subgroup’s name (e.g., T-cell-specific).
mapping_to_l1: Maps finer-resolution cell types (L2, L3, …) to their L1 parent. Targets must appear in lineages or other.
column_prefix: Controls how cell type column names are constructed in data files (e.g., cell_type.l1.Astro).

Pass the custom hierarchy to run_cascade() by including it in your config:

config$hierarchy <- brain
results <- run_cascade(config, output_dir = "brain_results/")