GeneTrail 3.2
Advanced high-throughput enrichment analysis
Differentiation of human embryonic stem cells
Embryonic stem cells (ESCs) are pluripotent stem cells derived from the blastocyst stage of early mammalian embryos [Wikipedia]. They can be distinguished by their ability to differentiate into any embryonic cell type as well as by their ability to self-renew [Wikipedia]. During the differentiation process they undergo a variety of cell states that are caused by dynamic changes in gene expression as well as the activity of signaling pathways. Here, we analyze a time-resolved RNA-Seq data set of human ESC cells that are in vitro and in vivo differentiated into pancreatic endoderm and endocrine cells (E-MTAB-1086). The dataset contains expression profiles of several distict cell types in this process. We use this data set to investigate biological processes that are active in the first four in vitro diffenrentiated developmental stages.
Analysis
The data set contains RNA-Seq experiments from distict cell states in the differentiation process. For our analysis, we only consider for different time points (0 days, 2 days, 5 days, and 7 days), which correspond to the four different cell states. For each time point 2 replicates were created. We used the mean value to aggregate all replicates. Finally, we used the time series data to study which biological pathways are associated with distict cell states.
Data set
The input for the time series workflow is a gene expression matrix, where columns represent measurements for the investigated time points.
The normalized and logarithmized gene expression matrix can be found here.
Row names represent gene names.
Column names (ES_Day0, DE_Day2, GT_Day5, FG_Day7) represent the 4 measured time points/cell states.
Technical Background
We conduct the first clustering step using a strong threshold in order to find clusters with high similarity. In the second clustering, where we aim to identify super-clusters, we use a less strict threshold. Furthermore, we do not want to consider height differences of the curves. Therefore, we use the Euclidean distance for gradients as distance measure.
Parameter
Filtering
- Difference between minimal and maximal time point: 2.0
Clustering step 1
- Distance measure: Angle distance
- Linkage method: Complete Linkage
- Threshold for cluster: 0.8
- Minimum number of genes for each cluster 1
Clustering step 2
- Distance measure: Euclidean distance for gradients
- Linkage method: Complete Linkage
- Threshold for cluster: 0.975
ORA
- P-value strategy: Upper tailed
- P-value adjustment method: Benjamini-Hochberg
- Significance level: 0.05
Step-by-step slideshow
The following slideshow depicts the different analysis steps of the GeneTrail3 time series workflow.
Results
In the following, a few results of our conducted analysis are shown. In particular, we present the enrichment results of super-cluster SC1. Associated enrichment results highlight processes associated with the differentiation from embryonic stem cells to definitive endoderm cells (cf. SC1).