A full list of parameters and descriptions can be found in the default configuration file default_config.yaml. This file is writtern in YAML format.
Tab-separated text file with two columns: sample_id, sample_class. The first line should be columns names: sample_id, sample_class.
This YAML file contains definition of sample groups for feature selection. In the following example, there are two comparisons: “Normal-Cancer”, “Normal-stage_A”.
In “Normal-Cancer”, the negative class and positive class are “Normal” and samples with “stage_A”, “stage_B”, “stage_C” respectively.
In “Normal-stage_A”, “Normal” samples are compared to only “stage_A” samples. The class labels after comparison name should match class labels defined in
Normal-Cancer: ['Normal', 'stage_A,stage_B,stage_C'] Normal-stage_A: ['Normal', 'stage_A']
Tab-separated text file with at least two columns: sample_id, batch1, batch2. The first line should be columns names.
Note that only one batch variable is supported at one time, although it is possible to specify multiple batch variables in the file.
When multiple batch variables are specified, you can set the
batch_index variable in
Tab-separated text file of read counts. Columns are samples and rows are features. The first line should be column names. The first column should be row names.
Variables in file patterns
||Output directory for the dataset, e.g.
||Combination of matrix processing methods|
||Type of feature counts, e.g.
||Name of the negative-positive class pair defined in
||Classifier defined in the configuration file|
||Maximum number of features to select|
||Feature selection method, e.g.
||Direction of fold change for filtering features. Three possible values:
List of files in output directory
|File name pattern||Descrpition|
||Selected features. Plain text with one column: feature names|
||Plain text with two columns: feature name, feature importance|
||Sample IDs in input matrix selected for feature selection|
||Sample class labels selected for feature selection|
||Final model fitted on all samples in Python pickle format|
||Evaluation metrics on training data. First row is metric names. First column is index of each train-test split|
||Same format with
||Cross-validation details in HDF5 format.|
Cross validation details (cross_validation.h5)
|feature_selection||(n_splits, n_features)||Binary matrix indicating features selected in each cross-validation split|
|labels||(n_samples,)||True class labels|
|predicted_labels||(n_splits, n_samples)||Predicted class labels on all samples|
|predictions||(n_splits, n_samples)||Predicted probabilities of the positive class (or decision function for SVM)|
|train_index||(n_splits, n_samples)||Binary matrix indicating training samples in each cross-validation split|