GeneTrail 3.2
Advanced high-throughput enrichment analysis
Regulator-target interactions (RTIs)
For our web server, we built an extensive collection of regulator-target gene interactions (RTIs) based on external databases: ChEA [1], ChIP-Atlas, ChipBase [2], ENCODE [3], Jaspar [4], Signalink [5] and TRANSFAC [6]. For each analysis users can decide which RTIs should be used for their computations.
Database content
The databases we used to build our collection of RTIs provide different levels of information on regulators and their target genes. Some databases provide manually curated RTIs extracted from primary literature while others provide binding information extracted from ChIP-Seq experiments. These binding sites are either provided as raw BED files that contain ChIP-Seq peaks or already processed data that contain information if a regulator binds in a predefined interval around the transcription start site (TSS) of given genes.
An overview of the RTIs provided by the different databases is shown in the following tables:
Database | #RTIs | #Regulators | #Targets | Retrieval date (YYYY/MM/DD) |
---|---|---|---|---|
ChEA [1] | 221,627 | 126 | 22,000 | 2016/06/28 |
ChIP-Atlas | 3,235,564 | 659 | 18,976 | 2016/04/12 |
ChipBase [2] | 1,772,088 | 94 | 22,845 | 2016/04/12 |
ENCODE [3] | 7,873,710 | 150 | 22,615 | 2016/04/12 |
Jaspar [4] | 328,396 | 69 | 22,374 | 2016/04/12 |
Signalink [5] | 26,760 | 293 | 1,905 | 2016/04/12 |
TRANSFAC [6] | 10,517,489 | 714 | 33,895 | 2016/04/28 |
Total | 23,975,679 | 1,076 | 36,242 | - |
- GENCODE GRCh38
- GENCODE GRCh37
Database | Predefined | TSS -/+ 1000 | TSS -/+ 5000 | TSS -/+ 10000 | TSS -10000 / +1000 | Retrieval date (YYYY/MM/DD) |
---|---|---|---|---|---|---|
ChEA [1] | 2016/06/28 | |||||
ChIP-Atlas | 2017/03/20 | |||||
ChipBase [2] | 2017/03/20 | |||||
ENCODE [3] | 2017/03/20 | |||||
Jaspar [4] | 2017/03/29 | |||||
Signalink [5] | 2017/03/20 | |||||
TRANSFAC [6] | 2017/03/20 |
- GENCODE Release 26 (GRCh38.p10)
- GENCODE Release 26 (mapped to GRCh37)
- 1,468
- 36,731
Database | Predefined | TSS -/+ 1000 | TSS -/+ 5000 | TSS -/+ 10000 | TSS -10000 / +1000 | Retrieval date (YYYY/MM/DD) |
---|---|---|---|---|---|---|
ChEA [1] | 2016/06/28 | |||||
ChIP-Atlas | 2017/03/20 | |||||
ChipBase [2] | 2017/03/20 | |||||
ENCODE [3] | 2017/03/20 | |||||
Jaspar [4] | 2017/03/29 | |||||
TRANSFAC [6] | 2017/03/20 |
- GENCODE Release M13 (GRCm38.p5)
- 782
- 35,286
Database | Predefined | TSS -/+ 1000 | TSS -/+ 5000 | TSS -/+ 10000 | TSS -10000 / +1000 | Retrieval date (YYYY/MM/DD) |
---|---|---|---|---|---|---|
ChipBase [2] | 2017/03/20 | |||||
TRANSFAC [6] | 2017/03/20 |
- UCSC RGSC 6.0/rn6 with RefSeq tracks
- 231
- 1,4245
Data processing
In this section we describe the processing steps conducted to retrieve RTIs for the different databases.
ChEA
The ChEA database provides predefined collection of RTIs extracted from ChIP-Seq, ChIP-ChIP, ChIP-PET and DamID experiments. The entire collection was downloaded and converted to the RTI file format (see below).
ChIP-Atlas
The Chip-Atlas database contains processed ChIP-Seq data that provides information if a regulator binds in predefined intervals around the TSS of their target genes. We downloaded the entire sets and converted them to the RTI file format (see below).
- TSS -/+ 1000
- TSS -/+ 5000
- TSS -/+ 10000
ChipBase
ChipBase provides BED files that are already annotated with the distance to neighboring genes. We downloaded all BED files and extracted RTI sets for the following intervals:
- TSS -/+ 1000
- TSS -/+ 5000
- TSS -/+ 10000
- TSS -10000 / +1000
Encode and Jaspar
Encode and Jaspar provide raw BED files for a large number of ChIP-Seq experiments. All BED files were downloaded and processed using BEDTools [7] [8] to find regulator binding sites around the transcriptional start sites of all genes (protein coding and non-coding). The used gene annotations can be found in the overview table.
- TSS -/+ 1000
- TSS -/+ 5000
- TSS -/+ 10000
- TSS -10000 / +1000
bedtools window -a hg19TSS.bed -b peaks.bed -l 1000 -r 1000 -sw > annotated_peaks_tss_1000_1000.bed bedtools window -a hg19TSS.bed -b peaks.bed -l 5000 -r 5000 -sw > annotated_peaks_tss_5000_5000.bed bedtools window -a hg19TSS.bed -b peaks.bed -l 10000 -r 10000 -sw > annotated_peaks_tss_10000_10000.bed bedtools window -a hg19TSS.bed -b peaks.bed -l 10000 -r 1000 -sw > annotated_peaks_tss_10000_1000.bed
SignaLink 2.0
SignaLink contains a predefined collection transcription factor binding information. All interactions were downloaded and converted to the RTI file format (see below).
TRANSFAC PRO / TRANSPRO
The TRANSFAC database contains manually curated RTIs extracted from primary literature as well as processed data extracted from ChIP and ChIP-ChIP experiments with regulator binding information in predefined promotor regions around the TSS of their target genes.
RTI file format
While GeneTrail already offers a large collection of RTIs, it can be desirable to upload custom data that is not yet included. For this purpose users can upload their own RTIs in a tab-delimited format. In this format every line represents a single RTI. The first column corresponds to the name of the regulator and the second column to the respective target gene.
E2F5 PSMA2P1 E2F5 ZNF879 E2F5 OSMR-AS1 E2F5 CAMK1 E2F5 SPR E2F5 ZNF700 E2F5 ZNF707 E2F5 CAMK4 E2F5 OR8A3P E2F5 EDEM2 E2F5 ZC3H10 E2F5 RNF114 E2F5 ZC3H15
Bibliography
- ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments Bioinformatics Oxford Univ Press
- ChIPBase: a database for decoding the transcriptional regulation of long non-coding RNA and microRNA genes from ChIP-Seq data Nucleic acids research Oxford Univ Press
- The ENCODE ENCyclopedia of DNA elements project Science American Association for the Advancement of Science
- JASPAR: an open-access database for eukaryotic transcription factor binding profiles Nucleic acids research Oxford Univ Press
- SignaLink 2--a signaling pathway resource with multi-layered regulatory networks BMC systems biology BioMed Central
- TRANSFAC: transcriptional regulation, from patterns to profiles Nucleic acids research Oxford Univ Press
- BEDTools: a flexible suite of utilities for comparing genomic features Bioinformatics Oxford Univ Press
- BEDTools: the Swiss-army tool for genome feature analysis Current protocols in bioinformatics Wiley Online Library