GeneTrail

Tutorial for GeneTrail:

Step 1 - Upload your data

GeneTrail offers you two main variants for data input. You either may choose to download preprocessed and normalized expression data from GEO or provide a precomputed list of scores.

Gene Expression Omnibus

The Gene Expression Omnibus (GEO) is a MIAME compliant online database for microarray experiments. Normalized data is stored in the GEO SOFT format, whereas unprocessed data is stored in a platform dependent raw format. Currently GeneTrail 2 supports the SOFT format for various platforms and organisms:

Supported Organisms

Homo Sapiens (9606)
Mus Musculus (10090)
Rattus Norvegicus (10116)
Arabidopsis thaliana (3702)
Danio rerio (7955)
Drosophila melanogaster (7227)
Caenorhabditis elegans (6239)
Anopheles gambiae (180454)
Bos taurus (9913)
Canis familiaris (9615)
Gallus gallus (9031)
Plasmodium falciparum 3D7 (36329)
Pan troglodytes (9598)
Sus scrofa (9823)

When using a record from GEO, GeneTrail relies on the proper normalization of the stored data. If you want to normalize the data yourself, you will need to obtain and process the raw data from GEO and upload a score file.

The SOFT format is supported for GEO Datasets (GDS) and GEO Series (GSE). GeneTrail requires you to select either one GSE record and distribute the contained samples into a test set and control set or select two GDS records that define your sample and reference set.

In case you choose a GSE file enter a valid GSE identifier (e.g., GSE14767). The corresponding GEO Series .soft file is then downloaded to the GeneTrail server automatically. In a next step, you may specify the sample and the reference group.

In case you choose two GDS files enter valid GDS identifiers (e.g., GDS2161 and GDS2162) for the test and control group, respectively. The corresponding GEO Data Set .soft files are then downloaded to the GeneTrail server automatically.

In case you choose a text file upload a plain text file containing identifier with or without pre-computed scores. The values have to be whitespace separated. (example)

Step 2 - Manage your data (only for NCBI GEO files)

If you have uploaded one GSE file containing both, data of the test group and the control group, the sample identifiers (GSMs) are displayed in the data pool.

You can then select arbitrary GSMs and move them either to the sample group or to the reference group. GeneTrail 2 also provides a link to inspect the GSE file on the NCBI webserver.

Step 3 - Scoring (only for GSE and GDS files)

In this step a score for differential expression between the two groups is calculated.

If your test group consists of multiple samples you can choose from the following scoring schemes:

Independent Shrinkage t-Test
Independent Students t-Test
Wilcoxon Rank Sum Test
Signal to Noise Ratio
F-Test
Log-Mean-Fold-Quotient
Mean-Fold-Quotient

If however your test group consists only of a single sample (e.g. for diagnostic purposes) all test statistics are replaced by the z-score:

z-score
Log-Mean-Fold-Quotient
Mean-Fold-Quotient

Step 4 - Score transformation

In this step a score for differential expression between the two groups is calculated.

If your sample group consists of multiple samples you can choose from the following scoring schemes:

Independent Shrinkage t-Test
Independent Students t-Test
Wilcoxon-Mann-Whitney-Test
Signal to Noise Ratio
F-Test
Log-Mean-Fold-Quotient
Mean-Fold-Quotient
Mean-Fold-Difference

If your sample and reference groups have the same size the following test statistics can also be chosen:

Paired Student's t-Test
Wilcoxon Matched Pairs Signed Rank Test

If your sample and reference groups more than 15 samples:

Pearson Correlation
Spearman Correlation

If however your sample group consists only of a single sample (e.g. for diagnostic purposes) all test statistics are replaced by the z-score:

z-score
Log-Mean-Fold-Quotient
Mean-Fold-Quotient
Mean-Fold-Difference

Step 5 - Select Algorithm

GeneTrail offers you several methods to analyse your data. Currently,the following algorithms are supported:

Enrichment algorithms:
- Gene Set Enrichment Analysis (article)
- Weighted Gene Set Enrichment Analysis (article)
- Over Representation Analysis (article)
- Wilcoxon Rank Sum Test (article)
- One Sample t-Test (article)
- Two Sample t-Test (article)
- Mean/Median/Sum of single gene statistic (article)
- Max-Mean statistic (article)
Algorithms to find deregulated subgraphs in regulatory networks:
- Subgraph ILP (article)
- FiDePa (article)

Additionally, you can download your annotated scoring file.

Step 6 - Algorithms

Subgraph size

Here, you can either enter a single value or a range of values for the size of the subgraph:

single value: Single values, e.g., 25
single range: Separate values by a dash, e.g., 10-25
multiple ranges: Separate ranges by a semicolon, e.g., 1-12; 15-20; 25-30

Scoring

Scoring mode

You can specify whether you want to use positive and negative values or absolute values as scores.

Node Mapping

Furthermore, you can specify how the scores for composite nodes (families and complexes) are computed:

Maximum: This option causes the score of the member with the highest score to be used as the score of the composite node.
Minimum: This option causes the score of the member with the lowest score to be used as the score of the composite node.
Average: This option computes the average score of all members of a composite node. Please note that if you use the absolute values option there are two ways how this score can be computed. Either absolute values are taken before computing the average value, or the absolute value of the computed average is used. You will be able to choose between these two options.

Path length

Here, you can enter a single value k to specify the maximal path length, e.g.: 25

Scoring

You can specify whether you want to find up- or down regulated paths up to a length k.

Node Mapping

Furthermore, you can specify how the scores for composite nodes (families and complexes) are computed:

Maximum: This option causes the score of the member with the highest score to be used as the score of the composite node.
Minimum: This option causes the score of the member with the lowest score to be used as the score of the composite node.
Average: This option computes the average score of all members of a composite node. Please note that if you use the absolute values option there are two ways how this score can be computed. Either absolute values are taken before computing the average value, or the absolute value of the computed average is used. You will be able to choose between these two options.

You can select all parameters that should be used for your analysis.

Step 7 - Contact Information

If you enter a valid email address, a link to the results of your analysis will be sent to the specified address. If you decide not to enter an email address, please do not close your browser window - otherwise the results of your analysis cannot be returned to you.

Step 8 - Start/Stop Computation

Pressing the cancel job-button will abort your analysis. If instances of your problem have successfully been computed, GeneTrail 2 will pack these and provide them for download. The provided link for (re-)downloading and viewing your results will remain valid for two weeks.

Step 9 - Results

Your computation has completed successfully and you can now download or view your results. For visualization GeneTrail offers you to choose between a simple browser based solution using the Cytoscape JS library. Alternatively, you can choose BiNA, a graphical tool for network analysis, as your viewer.

If you want to view your results in a different viewer or utilize it for further analysis, you can download them as a raw package.

The subgraph result directory contains the following files:

xxxxx.kYY.sif The resulting subnetwork for k = YY
xxxxx.names.na The NCBI gene symbols for all genes
xxxxx.score.na The scores used in the ILP computation
xxxxx.orig_scores.na The signed scores. This is useful to distinguish between up and down regulation.
xxxxx.SomeDBName.na The identifiers of the Database "SomeDBName" that are mapped on the network nodes.

The ids used in the SIF files (and the left-hand side of the NA files) are an arbitrary, unique identifier for each gene and bear no further meaning.

The Biological Network Analyzer (BiNA) is a workbench for visualizing and analyzing biological networks. Various biological networks can be displayed, edited and analyzed. Please visit http://www.bina.unipax.info for more information.

With Cytoscape JS you can inspect the networks immediately in your browser.

GeneTrail 3.2

Advanced high-throughput enrichment analysis

Tutorial for GeneTrail:

Step 1 - Upload your data

Gene Expression Omnibus

Supported Organisms

Step 2 - Manage your data (only for NCBI GEO files)

Step 3 - Scoring (only for GSE and GDS files)

Step 4 - Score transformation

Step 5 - Select Algorithm

Step 6 - Algorithms

Subgraph size

Scoring

Scoring mode

Node Mapping

Path length

Scoring

Node Mapping

Step 7 - Contact Information

Step 8 - Start/Stop Computation

Step 9 - Results