GeneTrailExpress Tutorial

    To start a new GeneTrailExpress analysis select "GeneTrailExpress" from the menu above. Help is also available in each step by drawing your mouse over the question marks

    First, the type of data to be analyzed can be selected. The user can chose to use the Gene Expression Ominbus database as source, or upload a plain text file containing in each row a gene or protein identifier and tab delimited a real value for the respective gene or protein. As third upload option, an expression matrix can be uploaded. This matrix has in each row expression values for genes or proteins, the first column denotes the respective gene or protein identifier and the first row denotes the class labels ('1' and '2') for each sample. DataSelection
    Having selected the Gene Expression Omnibus as data source, the user is offered an intuitive form to specify the data sets and the samples of the data set for each of both groups. The first entry denotes the data set, starting with GDS followed by a number. The second entry denotes the samples of the data set that should be included in the analysis. Blocks of samples can be defined by dashes (e.g. 2-6), single samples have to be spererade by a semikolon (e.g. 1,3,5). GeoInterface
    Independent on the chosen data source, you are presented two box plots showing the distribution of all data in each of the two classes. This image may assist your decision to normalize your data or to carry out the statistical analysis using the raw data. 03_boxplot.png
    In the following you can decide whether to perform a normalization and if so, which type of normalization. Currently, mean value normalization (scale data to a mean of 0), median normalization (scale data median to 1), and varianz normalization (scale data to a mean of zero and unit variance, corresponding to z-scores) are available. 04_normalization.png
    In the next step of the GTXP analysis pipeline, you may specify the scoring criterion which is used to compare each gene in the two classes. Please note that this step is ommited if you have uploaded an already scored matrix in the first step. GTXP offers paired and unparired options to score genes. Paired test should be applied only, if there exists a one-to-one matching betwenn samples in both groups, for example the first column of the first class may be measured from a tissue sample of a cancer patient and the first column of the second class may be measured from control tissue of the same patient. A list of scoring methods is presented besides, further explanations of the respective methods can be achieved by moving the mouse cursor over the qestion marks. 05_scoring.png
    In the last preprocessing step, the user can specify whether a GSEA or an ORA should be carried out using GeneTrail. If GSEA is the method of choice, the genes are ranked accordingly to their previously computed scores and this list is used as input for GeneTrail. If the user decides to carry out a ORA analysis, the uploaded and scored genes or proteins have to be splitted up in training and test set. To this end, either the to X percent of genes may be used as testset or all genes having a score below or above a certain threshold can be used as test set.
    In each case, the user is linked to the GeneTrail parameter selection site where it can select the biological categories to be analyzed and where he optionally can change GeneTrails standart parameters. If you want to learn more on GeneTrail and the parameters, please have a look at our publcations. An online tutorial is also available here.
    The below image shows one entry of the result list of GeneTrail computed for the selected parameters and the uploaded data set (in this case, the example as shown on the GTXP homepage, GDS1312 samples 1-5 vs. samples 6-10, median quotient without normalization). The Cell Cycle contains many gens that are higher expressed in lung cancer, indicated by an accumulation of these genes on bottom of the sorted list of the quotients normal expression divided by lung cancer expression. 06_test.png
    Bi clicking the BiNA link which is provided for many biochemical pathways, the visualization tool BiNA is started. This java Web Start application stores pathway information on the local drive if BiNA is used the first time or if newer data are available. Thereafter, BiNA draws the selected biochemical pathway, the expression quotients or gene scores are directly mapped onto the respective nodes. 06_test.png

    ZBI email to webmaster ZBI