Switch Search

Identify isoform switches across cell types or conditions in single-cell long-read data.

safe_sum


def safe_sum(
    X
):

Call self as a function.


build_isoform_metrics_table


def build_isoform_metrics_table(
    adata:ad.AnnData, group_columns:Sequence[str] | str, comparisons:Sequence[Tuple[str, str]] | None=None,
    group_vs_rest:bool=False, epsilon:float=1e-06, n_jobs:int=-1
)->pd.DataFrame:

Call self as a function.


delta_pi_one_gene_signed


def delta_pi_one_gene_signed(
    dpsi:pd.Series
)->float:

Signed Tilgner Δπ (keeps ± sign of the stronger direction).


delta_pi_one_gene


def delta_pi_one_gene(
    dpsi:pd.Series
)->float:

Unsigned Tilgner Δπ (always ≥ 0).


build_isoform_metrics_table


def build_isoform_metrics_table(
    adata:ad.AnnData, group_columns:Sequence[str] | str, comparisons:Sequence[Tuple[str, str]] | None=None,
    group_vs_rest:bool=False, epsilon:float=1e-06, n_jobs:int=-1
)->pd.DataFrame:

Call self as a function.


delta_pi_one_gene_signed


def delta_pi_one_gene_signed(
    dpsi:pd.Series
)->float:

Signed Tilgner Δπ (keeps ± sign of the stronger direction).


volcano_grid


def volcano_grid(
    df_all, n_cols:int=3, # How many columns in the grid.
    figsize_per_panel:float=3.0, # inches
    save:NoneType=None, # "all_volcanos.pdf" / ".svg" / None
    show:bool=True, # Whether to display the figure inline.
    panel_kw:VAR_KEYWORD, # Extra kwargs for `draw_volcano_panel` (cut-offs, sizes, etc.).
):

Draw every (group_1, group_2) combo in df_all on one figure grid.


draw_volcano_panel


def draw_volcano_panel(
    ax, df_comp, # dataframe for ONE (group1, group2)
    effect_col:str='effect_size', pval_col:str='adj_pval', fdr_cutoff:float=0.05, eff_cutoff:float=0.1, top_n:int=6,
    bg_size:int=10, # marker areas (pt²)
    sig_size:int=25, font_size:int=7
):

Draws a Δπ vs FDR volcano into ax* (NO fig.show/save here).*


filter_sample_replicated_transcripts


def filter_sample_replicated_transcripts(
    adata, sample_col:str='sample', min_samples:int=2, min_umi:int=1
):

Filters out transcripts that are not detected above a minimum UMI threshold in at least a specified number of samples (replicates).

Parameters: adata: AnnData object containing transcript-level counts. sample_col: Column in adata.obs that identifies the replicate/sample. min_samples: Minimum number of samples in which the transcript must be expressed. min_umi: Minimum UMI count in a sample for the transcript to be considered “expressed”.

Returns: A new AnnData object containing only transcripts that replicate across samples.

API reference


SwitchSearch


def SwitchSearch(
    anndata_obj:ad.AnnData, group_columns:Sequence[str] | str=('cell_type',), n_jobs:int=-1, fast_mode:bool=True,
    precompute_metrics:bool=False
):

χ²-based isoform-switch screen supporting nested and 1-vs-rest designs, with auto-managed transcript-metrics caching. FP-control additions: - min_expected_count: χ² validity guard (default 5.0; set None to disable) - min_gene_total: require gene total counts per group (default None/off) - min_isoforms_gene: require >= this many isoforms for a gene (default 2) - min_present_isoforms: require >= this many isoforms pass count threshold (optional) New (optional): - min_prevalence_pct: require >= this % of spots/cells express an isoform (in either group) - min_prevalent_isoforms: require >= this many isoforms meet prevalence criterion (default 2) - prevalence_count_thr: isoform considered expressed in a cell/spot if count >= thr (default 1)


SwitchSearch.find_switches_chi2


def find_switches_chi2(
    primary_col:Optional[str]=None, secondary_col:Optional[str]=None, within:str='primary', group_vs_rest:bool=False,
    targets:Optional[Sequence[str]]=None, min_reads:int=30, # existing thresholds
    fdr:float=0.05, min_expected_count:Optional[float]=5.0, # set None to disable
    min_gene_total:Optional[int]=None, # per group; set None to disable
    min_isoforms_gene:int=2, min_present_isoforms:Optional[int]=None, # set None to disable
    present_count_thr:int=1, min_prevalence_pct:Optional[float]=None, # e.g. 1.0 or 5.0; set None to disable
    min_prevalent_isoforms:int=2, # require >= N isoforms meet prevalence in either group
    prevalence_count_thr:int=1, # isoform is "expressed" in cell/spot if count>=thr
    return_transcript_metrics:bool=False, # outputs
    calc_effect_size:bool=False, effect_size_mode:str='abs', # 'abs' or 'signed'
    drop_unmatched_metrics:bool=True, n_jobs:Optional[int]=None
)->pd.DataFrame:

Call self as a function.


pseudobulk_diff_splice


def pseudobulk_diff_splice(
    adata, group_col:str='cell_type', replicate_col:str='batch', layer:NoneType=None, # None = use .X
    gene_col:str='geneId', comparisons:NoneType=None, # list of (g1, g2); None = all pairwise
    group_vs_rest:bool=False, min_cells:int=10, # drop pseudobulk samples with fewer cells
    min_transcript_counts:int=10, # min total counts across all pseudobulk samples
    min_isoform_fraction:float=0.01, # min fraction of gene total (filters minor isoforms)
    min_samples_expressed:int=1, # min pseudobulk samples with >0 counts
    covariates:NoneType=None, # str or list of obs column names
    output_level:str='gene', # "gene" or "transcript"
    fdr:float=0.05, return_all:bool=False, # if True, skip FDR filter and return everything
    prior_count:float=0.125, n_jobs:int=1
): # gene_id, n_transcripts, group_1, group_2,
stat, p_value, simes_p_value, adj_pval

Pseudobulk differential splicing test using edgePython’s diff_splice (QL F-test).

Counts are summed per (group_col, replicate_col), a quasi-likelihood GLM is fitted per comparison pair, and diff_splice tests whether each gene/transcript shows differential isoform usage beyond its overall expression change.