Regulator-target interactions (RTIs)

For our web server, we built an extensive collection of regulator-target gene interactions (RTIs) based on external databases: ChEA [1], ChIP-Atlas, ChipBase [2], ENCODE [3], Jaspar [4], Signalink [5] and TRANSFAC [6]. For each analysis users can decide which RTIs should be used for their computations.

Database content

The databases we used to build our collection of RTIs provide different levels of information on regulators and their target genes. Some databases provide manually curated RTIs extracted from primary literature while others provide binding information extracted from ChIP-Seq experiments. These binding sites are either provided as raw BED files that contain ChIP-Seq peaks or already processed data that contain information if a regulator binds in a predefined interval around the transcription start site (TSS) of given genes.

An overview of the RTIs provided by the different databases is shown in the following tables:


Database #RTIs #Regulators #Targets Retrieval date (YYYY/MM/DD)
ChEA [1] 221,627 126 22,000 2016/06/28
ChIP-Atlas 3,235,564 659 18,976 2016/04/12
ChipBase [2] 1,772,088 94 22,845 2016/04/12
ENCODE [3] 7,873,710 150 22,615 2016/04/12
Jaspar [4] 328,396 69 22,374 2016/04/12
Signalink [5] 26,760 293 1,905 2016/04/12
TRANSFAC [6] 10,517,489 714 33,895 2016/04/28
Total 23,975,679 1,076 36,242 -

  • GENCODE GRCh38
  • GENCODE GRCh37
Database Predefined TSS -/+ 1000 TSS -/+ 5000 TSS -/+ 10000 TSS -10000 / +1000 Retrieval date (YYYY/MM/DD)
ChEA [1] 2016/06/28
ChIP-Atlas 2017/03/20
ChipBase [2] 2017/03/20
ENCODE [3] 2017/03/20
Jaspar [4] 2017/03/29
Signalink [5] 2017/03/20
TRANSFAC [6] 2017/03/20

  • GENCODE Release 26 (GRCh38.p10)
  • GENCODE Release 26 (mapped to GRCh37)

  • 1,468
  • 36,731
Database Predefined TSS -/+ 1000 TSS -/+ 5000 TSS -/+ 10000 TSS -10000 / +1000 Retrieval date (YYYY/MM/DD)
ChEA [1] 2016/06/28
ChIP-Atlas 2017/03/20
ChipBase [2] 2017/03/20
ENCODE [3] 2017/03/20
Jaspar [4] 2017/03/29
TRANSFAC [6] 2017/03/20

  • GENCODE Release M13 (GRCm38.p5)

  • 782
  • 35,286
Database Predefined TSS -/+ 1000 TSS -/+ 5000 TSS -/+ 10000 TSS -10000 / +1000 Retrieval date (YYYY/MM/DD)
ChipBase [2] 2017/03/20
TRANSFAC [6] 2017/03/20

  • UCSC RGSC 6.0/rn6 with RefSeq tracks

  • 231
  • 1,4245
Database Predefined TSS -/+ 1000 TSS -/+ 5000 TSS -/+ 10000 TSS -10000 / +1000 Retrieval date (YYYY/MM/DD)
ChIP-Atlas 2017/03/20
ENCODE [3] 2017/03/20
Jaspar [4] 2017/03/29
TRANSFAC [6] 2017/03/20

  • UCSC WBcel235/ce11 with RefSeq tracks

  • 242
  • 20,195
Database Predefined TSS -/+ 1000 TSS -/+ 5000 TSS -/+ 10000 TSS -10000 / +1000 Retrieval date (YYYY/MM/DD)
ChIP-Atlas 2017/03/20
ENCODE [3] 2017/03/20
Jaspar [4] 2017/03/29
Signalink [5] 2017/03/20
TRANSFAC [6] 2017/03/20

  • UCSC BDGP Release 6 + ISO1 MT/dm6 with RefSeq tracks

  • 491
  • 13,909

Data processing

In this section we describe the processing steps conducted to retrieve RTIs for the different databases.

ChEA

The ChEA database provides predefined collection of RTIs extracted from ChIP-Seq, ChIP-ChIP, ChIP-PET and DamID experiments. The entire collection was downloaded and converted to the RTI file format (see below).

ChIP-Atlas

The Chip-Atlas database contains processed ChIP-Seq data that provides information if a regulator binds in predefined intervals around the TSS of their target genes. We downloaded the entire sets and converted them to the RTI file format (see below).

  • TSS -/+ 1000
  • TSS -/+ 5000
  • TSS -/+ 10000

ChipBase

ChipBase provides BED files that are already annotated with the distance to neighboring genes. We downloaded all BED files and extracted RTI sets for the following intervals:

  • TSS -/+ 1000
  • TSS -/+ 5000
  • TSS -/+ 10000
  • TSS -10000 / +1000

Encode and Jaspar

Encode and Jaspar provide raw BED files for a large number of ChIP-Seq experiments. All BED files were downloaded and processed using BEDTools [7] [8] to find regulator binding sites around the transcriptional start sites of all genes (protein coding and non-coding). The used gene annotations can be found in the overview table.

  • TSS -/+ 1000
  • TSS -/+ 5000
  • TSS -/+ 10000
  • TSS -10000 / +1000

bedtools window -a hg19TSS.bed -b peaks.bed -l 1000 -r 1000 -sw > annotated_peaks_tss_1000_1000.bed
bedtools window -a hg19TSS.bed -b peaks.bed -l 5000 -r 5000 -sw > annotated_peaks_tss_5000_5000.bed
bedtools window -a hg19TSS.bed -b peaks.bed -l 10000 -r 10000 -sw > annotated_peaks_tss_10000_10000.bed
bedtools window -a hg19TSS.bed -b peaks.bed -l 10000 -r 1000 -sw > annotated_peaks_tss_10000_1000.bed

SignaLink 2.0

SignaLink contains a predefined collection transcription factor binding information. All interactions were downloaded and converted to the RTI file format (see below).

TRANSFAC PRO / TRANSPRO

The TRANSFAC database contains manually curated RTIs extracted from primary literature as well as processed data extracted from ChIP and ChIP-ChIP experiments with regulator binding information in predefined promotor regions around the TSS of their target genes.

RTI file format

While GeneTrail already offers a large collection of RTIs, it can be desirable to upload custom data that is not yet included. For this purpose users can upload their own RTIs in a tab-delimited format. In this format every line represents a single RTI. The first column corresponds to the name of the regulator and the second column to the respective target gene.

E2F5    PSMA2P1
E2F5    ZNF879
E2F5    OSMR-AS1
E2F5    CAMK1
E2F5    SPR
E2F5    ZNF700
E2F5    ZNF707
E2F5    CAMK4
E2F5    OR8A3P
E2F5    EDEM2
E2F5    ZC3H10
E2F5    RNF114
E2F5    ZC3H15

Bibliography

  1. Lachmann, Alexander and Xu, Huilei and Krishnan, Jayanth and Berger, Seth I and Mazloom, Amin R and Ma'ayan, Avi ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments Bioinformatics Oxford Univ Press
  2. Yang, Jian-Hua and Li, Jun-Hao and Jiang, Shan and Zhou, Hui and Qu, Liang-Hu ChIPBase: a database for decoding the transcriptional regulation of long non-coding RNA and microRNA genes from ChIP-Seq data Nucleic acids research Oxford Univ Press
  3. ENCODE Project Consortium and others The ENCODE ENCyclopedia of DNA elements project Science American Association for the Advancement of Science
  4. Sandelin, Albin and Alkema, Wynand and Engstrom, Par and Wasserman, Wyeth W and Lenhard, Boris JASPAR: an open-access database for eukaryotic transcription factor binding profiles Nucleic acids research Oxford Univ Press
  5. Matys, Vea and Fricke, Ellen and Geffers, R and Gößling, Ellen and Haubrock, Martin and Hehl, R and Hornischer, Klaus and Karas, Dagmar and Kel, Alexander E and Kel-Margoulis, Olga V and others TRANSFAC: transcriptional regulation, from patterns to profiles Nucleic acids research Oxford Univ Press
  6. Quinlan, Aaron R and Hall, Ira M BEDTools: a flexible suite of utilities for comparing genomic features Bioinformatics Oxford Univ Press
  7. Quinlan, Aaron R BEDTools: the Swiss-army tool for genome feature analysis Current protocols in bioinformatics Wiley Online Library