exSEEK

exRNA Biomarker Discovery for Liquid Biopsy

Note:

Table of Contents:


Installation

For easy installation, you can use the exSEEK image of docker with all dependencies installed:

docker pull ltbyshi/exseek

Alternatively, you can use use singularity or udocker to run the container for Linux kernel < 3 or if you don’t have permission to use docker.

Usage

Run the main program exseek.py from docker:

docker run --rm -it -v $PWD:/workspace -w /workspace ltbyshi/exseek exseek.py

The exSEEK directory was cloned to /apps/exseek in the docker.

You can create a bash script named exseek and set the script executable:

#! /bin/bash
docker run --rm -it -v $PWD:/workspace -w /workspace ltbyshi/exseek exseek.py "$@"

After adding the file to one of the directory in the $PATH variable, you can simply run: exseek.

The basic usage of exSEEK is:

exseek ${step_name} -d ${dataset}

Note:

Input files

An example can be found in example_data directory with the following structure:

example_data/
├── config
│   └── example.yaml
├── data
│   └── example
│       ├── batch_info.txt
│       ├── compare_groups.yaml
│       ├── sample_classes.txt
│       └── sample_ids.txt
└── output
    └── example
        └── count_matrix
            └── mirna_and_domains_rna.txt

Note:

You can create your own data directory with the above directory structure. Multiple datasets can be put in the same directory by replacing “example” with your own dataset names.

More information about input and output files can be found on File Format page.

Normalization

Run:

exseek normalization -d ${dataset}

This will generate normalized expression matrix for every combination of methods with the following file name pattern:

output/${dataset}/matrix_processing/filter.${imputation_method}.Norm_${normalization_method}.Batch_${batch_removal_method}_${batch_index}.${count_method}.txt

You can specify normalization methods by setting the value of normalization_method and the batch removal methods by setting the value of batch_removal_method in in config/${dataset}.yaml.

Supported normalization methods: TMM, RLE, CPM, CPM_top, UQ, null

Supported batch removal methods: limma, ComBat, RUV, null

When the method name is set to “null”, the step is skipped.

${batch_index} is the column number (start from 1) in config/${dataset}/batch_info.txt to be used to remove batch effects.

Feature selection

Run:

exseek feature_selection -d ${dataset}

This will evaluate all combinations of feature selection methods and classifiers by cross-validation.

Three summary files will be generated:

Cross-validation results and trained models for individual combinations are in this directory:

output/${dataset}/feature_selection/filter.${imputation_method}.Norm_${normalization_method}.Batch_${batch_removal_method}_${batch_index}.${count_method}/${compare_group}/${classifier}.${n_select}.${selector}.${fold_change_filter_direction}

Selected list of features are in features.txt.

Note: More information about output files can be found on File format page. Detailed parameters of feature selection and classifiers can be found in config/machine_learning.yaml.

Advanced Usage

Copyright (C) 2019 Tsinghua University, Beijing, China

This program is licensed with commercial restriction use license. Please see the LICENSE file for details.

Citation

Binbin Shi, Jingyi Cao, Xupeng Chen and Zhi John Lu (2019) exSEEK: an integrative computational framework for identifying extracellular RNA biomarkers in liquid biopsy