| Title: | Comprehensive Analysis Suite for Cell-type specificity and Accessibility-Driven QTL Effects |
|---|---|
| Description: | Provides a hierarchical framework for systematically characterizing cell type specificity and regulatory mechanisms of molecular QTLs (molQTLs) from multiome analyses. Gene- and peak-level analyses classify eGenes and caPeaks by their cell type specificity patterns, with power-aware assessment using local false sign rates (LFSR) from multivariate adaptive shrinkage (mash) to identify likely shared but underpowered features. Variant-level analysis classifies fine-mapped variants by regulatory mechanism, with 25 mutually exclusive QTL patterns collapsing into eight mechanism categories that include three chromatin-to-expression cascade tiers. |
| Authors: | Masahiro Kanai [aut, cre] |
| Maintainer: | Masahiro Kanai <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.0 |
| Built: | 2026-05-31 20:10:59 UTC |
| Source: | https://github.com/mkanai/cascade |
Functions for analyzing variant heterogeneity patterns using Cochran's Q test and CS cluster-based approaches Add Variant Analysis to Categorization Results
Add variant heterogeneity analysis using Cochran's Q test or CS clusters
add_variant_analysis( results, susie_results, data_type, lfsr_results, meta_data, variant_feature_specificity, cochran_q_threshold = 5e-08, use_cs_clusters = TRUE, cs_clusters = NULL, cs_cluster_variants = NULL )add_variant_analysis( results, susie_results, data_type, lfsr_results, meta_data, variant_feature_specificity, cochran_q_threshold = 5e-08, use_cs_clusters = TRUE, cs_clusters = NULL, cs_cluster_variants = NULL )
results |
Results from vectorized categorization |
susie_results |
SuSiE results |
data_type |
Either "gene" or "peak" |
lfsr_results |
Optional LFSR results |
meta_data |
Required. Pre-computed meta data with Cochran's Q values |
variant_feature_specificity |
Required. Variant-feature specificity mapping |
cochran_q_threshold |
Cochran's Q p-value threshold |
use_cs_clusters |
Logical. Whether to use CS cluster-based analysis (default TRUE) |
cs_clusters |
Optional. CS cluster mappings from load_cs_clusters() |
cs_cluster_variants |
Optional. CS cluster variant mappings from load_cs_cluster_variants() |
List with updated results and variant details
caQTL Status Descriptions
CAQTL_STATUSCAQTL_STATUS
An object of class list of length 3.
High-level dispatch for categorizing genes or peaks across cell types, powered by vectorized C++ kernels.
categorize_features( feature_data, lfsr_results, susie_results, meta_data, variant_feature_specificity, feature_type = "gene", hierarchy = DEFAULT_CELL_HIERARCHY, lfsr_sig_threshold = LFSR_SIG_THRESHOLD, lfsr_null_threshold = LFSR_NULL_THRESHOLD, cochran_q_threshold = 5e-08, use_cs_clusters = TRUE, cs_clusters = NULL, cs_cluster_variants = NULL )categorize_features( feature_data, lfsr_results, susie_results, meta_data, variant_feature_specificity, feature_type = "gene", hierarchy = DEFAULT_CELL_HIERARCHY, lfsr_sig_threshold = LFSR_SIG_THRESHOLD, lfsr_null_threshold = LFSR_NULL_THRESHOLD, cochran_q_threshold = 5e-08, use_cs_clusters = TRUE, cs_clusters = NULL, cs_cluster_variants = NULL )
feature_data |
Feature data from load_feature_data(config, feature_type, chromosomes) |
lfsr_results |
Pre-computed LFSR results from load_lfsr_results() |
susie_results |
SuSiE results (optional) |
meta_data |
Required. Pre-computed meta data with Cochran's Q values |
variant_feature_specificity |
Required. Variant-feature specificity mapping from variant categorization |
feature_type |
Either "gene" or "peak" |
hierarchy |
A CellTypeHierarchy object; defaults to DEFAULT_CELL_HIERARCHY |
lfsr_sig_threshold |
Significance threshold used for both ACAT q-value significance and LFSR gray zone lower bound (by design, both use the same threshold). This is NOT acat_fdr_threshold (which is used only in variant data loading). |
lfsr_null_threshold |
LFSR null hypothesis threshold |
cochran_q_threshold |
P-value threshold for Cochran's Q heterogeneity test (default 5e-8) |
use_cs_clusters |
If TRUE, use credible-set cluster assignments when available; otherwise rely on Cochran's Q only |
cs_clusters |
Optional credible-set cluster assignments (loaded by load_cs_clusters()) |
cs_cluster_variants |
Optional cluster-level variant details (loaded by load_cs_cluster_variants()) |
List with two elements: categories (data frame with categorization results) and variant_details (data frame with variant details)
Main entry point for variant categorization. Runs per-cell-type QTL pattern detection (Stage 1), then aggregates across cell types with bulk 'data.table' operations (Stage 2).
categorize_variants( variant_data, lfsr_results = NULL, output_dir = NULL, config = NULL )categorize_variants( variant_data, lfsr_results = NULL, output_dir = NULL, config = NULL )
variant_data |
Variant data from 'load_variant_data_by_qtl_type()'. |
lfsr_results |
Pre-loaded LFSR results (optional). |
output_dir |
Optional output directory for saving intermediate results. |
config |
Configuration object (optional). |
List with 'per_celltype', 'cross_celltype', and 'variant_feature_specificity' results.
Functions for defining and working with cell type hierarchies used in specificity categorization. Create a Cell Type Hierarchy
Defines the cell type hierarchy for specificity categorization. Each grouping
level adds one specificity category. The package ships with
DEFAULT_CELL_HIERARCHY for immune cells (2 grouping levels -> 6
categories), but users can define their own system for any tissue or organism.
The categorization logic uses the hierarchy as follows:
"Cross-lineage shared": significant in 2+ top-level lineage groups
"Likely shared but underpowered": LFSR gray zone evidence of hidden sharing
"Lineage-specific": significant in exactly 1 lineage group
"Subgroup-specific": significant in 1 subgroup (one category per subgroup level)
"Single cell-type": significant in exactly 1 L1 cell type
"No significance": nothing significant
Total categories = 4 fixed + N grouping levels (1 lineage level + subgroup levels).
create_cell_hierarchy( lineages, subgroups = list(), bulk = NULL, other = NULL, mapping_to_l1 = list(), column_prefix = "predicted.celltype" )create_cell_hierarchy( lineages, subgroups = list(), bulk = NULL, other = NULL, mapping_to_l1 = list(), column_prefix = "predicted.celltype" )
lineages |
Named list with 2+ entries. Each entry is a character vector of L1 cell type names in that top-level group. Names become lineage labels. |
subgroups |
Ordered list of sub-grouping levels (optional). Each element
is a named list of groups at that depth. Groups must be subsets of a single
lineage. Ordered from broadest to narrowest. Use |
bulk |
Character vector of bulk/mixed cell type names (e.g., "PBMC"). Features significant only in bulk are categorized as "Likely shared but underpowered". NULL if none. |
other |
Character vector of cell types excluded from lineage grouping (e.g., "other" for unclassified types). These types still participate in single-cell-type detection and are valid mapping targets. NULL if none. |
mapping_to_l1 |
Named list mapping lower-level cell types to their L1 parents. Supports arbitrary QTL resolution depth (L2, L3, etc.) – all map directly to L1. Targets must be L1 lineage types or "other" types. Types not in this mapping are assumed to already be L1. |
column_prefix |
Prefix for cell type column names in data files. Default: "predicted.celltype". Columns will be "prefix.l1.name" for L1. |
A CellTypeHierarchy object (S3 class)
Helper function to create a configuration object for cascade analysis
create_config( cell_types, chromosomes = NULL, file_patterns = list(), parameters = list(pip_threshold = 0.5, min_pip_threshold = 0.1, acat_fdr_threshold = 0.05, lfsr_sig_threshold = LFSR_SIG_THRESHOLD, lfsr_null_threshold = LFSR_NULL_THRESHOLD, run_mash = FALSE, n_cores = NULL, mash_params = list(max_variants_per_gene = 5, alpha = 1, strong_z_threshold = 2)), column_mapping = list(), feature_type = "all" )create_config( cell_types, chromosomes = NULL, file_patterns = list(), parameters = list(pip_threshold = 0.5, min_pip_threshold = 0.1, acat_fdr_threshold = 0.05, lfsr_sig_threshold = LFSR_SIG_THRESHOLD, lfsr_null_threshold = LFSR_NULL_THRESHOLD, run_mash = FALSE, n_cores = NULL, mash_params = list(max_variants_per_gene = 5, alpha = 1, strong_z_threshold = 2)), column_mapping = list(), feature_type = "all" )
cell_types |
Vector of cell type names |
chromosomes |
Vector of chromosomes to analyze |
file_patterns |
List with file pattern templates |
parameters |
Analysis parameters including:
|
column_mapping |
Column name mappings for input files |
feature_type |
Type of features to analyze ("gene", "peak", "variant", or "all") |
Configuration list
The default hierarchy for immune cell QTL analysis, with myeloid/lymphoid lineages and T-cell subgroup. Produces 6 specificity categories.
DEFAULT_CELL_HIERARCHYDEFAULT_CELL_HIERARCHY
An object of class CellTypeHierarchy of length 12.
Maps internal column names to input column names for each file type. Entries with NULL values are optional and skipped during rename. Users can override individual entries via create_config(column_mapping = list(...)).
DEFAULT_COLUMN_MAPPINGDEFAULT_COLUMN_MAPPING
An object of class list of length 9.
eQTL Status Descriptions
EQTL_STATUSEQTL_STATUS
An object of class list of length 4.
Extracts detailed per-cell-type SuSiE results (beta, se, pip) for all variants in CS clusters using vectorized operations
extract_cs_cluster_susie_details( features, cs_clusters, cs_cluster_variants, susie_results, feature_type = "gene" )extract_cs_cluster_susie_details( features, cs_clusters, cs_cluster_variants, susie_results, feature_type = "gene" )
features |
Vector of feature IDs to extract |
cs_clusters |
Data table with CS to cluster mappings |
cs_cluster_variants |
Data table with cluster to variant mappings |
susie_results |
List of SuSiE results by cell type |
feature_type |
Either "gene" or "peak" |
Data table with per-cell-type SuSiE results for CS cluster variants
Filter to only L1 cell types
filter_l1_celltypes(celltypes, hierarchy)filter_l1_celltypes(celltypes, hierarchy)
celltypes |
Character vector of cell type column names |
hierarchy |
A CellTypeHierarchy object (required) |
Character vector of L1-only cell types
Get QTL Pattern Interpretation
get_pattern_interpretation(pattern_num)get_pattern_interpretation(pattern_num)
pattern_num |
Numeric pattern number (1-25) |
Character string with the pattern interpretation
A cell type is considered "L2" (or lower) if it appears as a key in the hierarchy's mapping_to_l1, meaning it maps to an L1 parent.
is_l2_celltype(celltype, hierarchy)is_l2_celltype(celltype, hierarchy)
celltype |
Character scalar or vector of cell type column names |
hierarchy |
A CellTypeHierarchy object (required) |
Logical vector
LFSR Null Hypothesis Threshold
LFSR_NULL_THRESHOLDLFSR_NULL_THRESHOLD
An object of class numeric of length 1.
LFSR Significance Threshold
LFSR_SIG_THRESHOLDLFSR_SIG_THRESHOLD
An object of class numeric of length 1.
Loads the file mapping clusters to their constituent variants
load_cs_cluster_variants( cs_cluster_variant_file, cache = NULL, column_mapping = NULL )load_cs_cluster_variants( cs_cluster_variant_file, cache = NULL, column_mapping = NULL )
cs_cluster_variant_file |
Path to the CS cluster variant file |
cache |
Cache object for memoization (optional) |
column_mapping |
Named list mapping internal names to input column names. If NULL, uses DEFAULT_COLUMN_MAPPING$cs_cluster_variants. |
Data table with cluster to variant mappings
Loads the CS cluster file that maps credible sets to clusters across cell types
load_cs_clusters(cs_cluster_file, cache = NULL, column_mapping = NULL)load_cs_clusters(cs_cluster_file, cache = NULL, column_mapping = NULL)
cs_cluster_file |
Path to the CS cluster file |
cache |
Cache object for memoization (optional) |
column_mapping |
Named list mapping internal names to input column names. If NULL, uses DEFAULT_COLUMN_MAPPING$cs_clusters. |
Data table with CS to cluster mappings indexed by feature
Generic function to load feature data (genes or peaks) with ACAT results
load_feature_data(config, feature_type, chromosomes = NULL, num_cores = NULL)load_feature_data(config, feature_type, chromosomes = NULL, num_cores = NULL)
config |
Configuration list with file patterns |
feature_type |
Type of feature ("gene" or "peak") |
chromosomes |
Chromosomes to analyze |
num_cores |
Number of cores for parallel processing (NULL to use config$parameters$n_cores) |
List with feature data
Load pre-computed LFSR results from external files
load_lfsr_results(config)load_lfsr_results(config)
config |
Configuration list containing file paths and cell types |
List containing LFSR data tables for eQTL and caQTL
Load meta data containing pre-computed Cochran's Q heterogeneity p-values
load_meta_data(meta_file, cache = NULL, column_mapping = NULL)load_meta_data(meta_file, cache = NULL, column_mapping = NULL)
meta_file |
Path to the meta data file |
cache |
Cache object for memoization (optional) |
column_mapping |
Named list mapping internal names to input column names. If NULL, uses DEFAULT_COLUMN_MAPPING$meta. |
Data table with variant-phenotype pairs and heterogeneity p-values
Functions for loading variant data with QTL information Load Variant Data for All Chromosomes with Parallelization
Load variant data across all specified chromosomes using parallel processing
load_variant_data( config, chromosomes = NULL, pip_threshold = 0.5, min_pip_threshold = 0.1, acat_fdr_threshold = 0.05, peak_bed_file = NULL, column_mapping = NULL, num_cores = NULL )load_variant_data( config, chromosomes = NULL, pip_threshold = 0.5, min_pip_threshold = 0.1, acat_fdr_threshold = 0.05, peak_bed_file = NULL, column_mapping = NULL, num_cores = NULL )
config |
Configuration list |
chromosomes |
Chromosomes to analyze |
pip_threshold |
Maximum PIP threshold across cell types for filtering (default: 0.5) |
min_pip_threshold |
Minimum PIP threshold per cell type (default: 0.1) |
acat_fdr_threshold |
FDR threshold for ACAT filtering |
peak_bed_file |
Path to peak BED file |
column_mapping |
Column name mapping |
num_cores |
Number of cores to use (NULL for auto-detect) |
Combined variant data from all chromosomes
Print method for CellTypeHierarchy
## S3 method for class 'CellTypeHierarchy' print(x, ...)## S3 method for class 'CellTypeHierarchy' print(x, ...)
x |
A CellTypeHierarchy object |
... |
Additional arguments (ignored) |
QTL mechanism categories, pattern definitions, variant heterogeneity codes, and other scientific constants used in categorization. QTL Mechanism Categories
Eight main categories for variant QTL mechanisms
QTL_MECHANISMSQTL_MECHANISMS
An object of class character of length 8.
Maps 25 QTL patterns to their interpretations and mechanism categories Each pattern has: interpretation (detailed description) and mechanism (category index)
QTL_PATTERNSQTL_PATTERNS
An object of class list of length 25.
Main entry point for a CASCADE analysis. Loads configured QTL inputs, runs gene/peak/variant categorization, and writes results to disk.
run_cascade(config, output_dir = "results")run_cascade(config, output_dir = "results")
config |
Configuration object (from 'create_config()') or path to a JSON config file. |
output_dir |
Directory to write results to. Created if it does not exist. |
Invisibly, a list of categorization results (gene, peak, variant).
Save Stage 2 results to TSV file (combined L1 + L2)
save_variant_results_cross_celltype( cross_celltype_results, output_dir, suffix = "" )save_variant_results_cross_celltype( cross_celltype_results, output_dir, suffix = "" )
cross_celltype_results |
Cross-cell-type results data frame |
output_dir |
Output directory |
suffix |
Optional suffix for file names (e.g., "l2") |
Save Stage 1 results to separate TSV files per cell type
save_variant_results_per_celltype(per_celltype_results, output_dir)save_variant_results_per_celltype(per_celltype_results, output_dir)
per_celltype_results |
List of per-cell-type results |
output_dir |
Output directory |
Categories for variant heterogeneity in multi-cell-type features. Maps category names to letter codes (a-d)
VARIANT_HETEROGENEITYVARIANT_HETEROGENEITY
An object of class character of length 4.