Readers

Load isoform-resolution single-cell data into Allos from a variety of formats.

get_resource_path


def get_resource_path(
    filename
):

Find the correct path to the resources/ directory based on execution context.


download_test_data


def download_test_data(
    url:str='https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3748nnn/GSM3748087/suppl/GSM3748087%5F190c.isoforms.matrix.txt.gz', # URL to download the data from.
    output_filename:str=None, # Name of the file to save the data as (default: name from the URL).
    decompress:bool=True, # Whether to decompress the file if it is a gzip archive (default True).
)->str: # Path to the downloaded or decompressed file.

Download test data to the correct directory, dynamically adjusting based on the execution context. Optionally decompresses gzip files if detected.


iso_concat


def iso_concat(
    data_inputs, batch_info:NoneType=None, batch_type:str='path'
):

Concatenates a list of AnnData objects or paths to AnnData objects based on the union of transcriptIds, while preserving geneId information which might be non-unique per transcriptId. Missing values are filled with zeros. Adds a batch column to .obs based on the file path, obs_names, or numeric.

Parameters: data_inputs (list of str or AnnData): List of paths to AnnData objects or AnnData objects to concatenate. batch_info (list of str, optional): List of batch identifiers for each AnnData object in data_inputs. If not provided, batch identifiers are extracted from file paths, obs_names, or a numeric sequence. batch_type (str, optional): Specifies which type of batch information to use. One of [‘path’, ‘obs_names’, ‘numeric’]. Defaults to ‘path’.

Returns: AnnData: A single concatenated AnnData object with harmonized features, geneId annotations, and batch info.

API reference


read_sicelore_isomatrix


def read_sicelore_isomatrix(
    file_path:str, # Path to the isomatrix file (tab-delimited).
    gene_id_label:str='geneId', # Row/column label used for gene IDs (default "geneId").
    transcript_id_label:str='transcriptId', # Row/column label used for transcript IDs (default "transcriptId").
    remove_undef:bool=True, # Whether to remove rows with transcriptId="undef" (default True).
    sparse:bool=False, # Whether to store the matrix in sparse format (default False).
)->AnnData: # An AnnData object containing numeric data in `.X` and metadata in `.var`.

Read a SiCeLoRe isomatrix file (tab-delimited) and convert it into a scanpy-compatible AnnData object.


process_mouse_data


def process_mouse_data(
    
):

Downloads test data, reads two mouse isoform count matrices, and merges them into a single AnnData object. It also reads a CSV file containing barcode-to-cell_type mappings, merges this information into the AnnData object’s obs DataFrame, and filters out entries with no cell_type assigned.

Returns: combined_mouse_data (AnnData): The merged and annotated AnnData object.