Skip to content

Commit 65d17ab

Browse files
committed
Updated Readme and documention.tex to consider the chromatin conformation capture data
1 parent a6501c7 commit 65d17ab

File tree

3 files changed

+20
-3
lines changed

3 files changed

+20
-3
lines changed

README.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# TEPIC (version 2.1)
1+
#TEPIC (version 2.2)
22
-------
33
TEPIC offers workflows for the prediction and analysis of Transcription Factor (TF) binding sites including:
44
* TF affinity computation in user provided regions
@@ -10,6 +10,8 @@ A graphical overview on the workflows of TEPIC is shown below. Blue font indicat
1010
![](docs/TEPIC_Workflow.png)
1111

1212
## News
13+
08.10.2019: We present a novel feature to include TFBS in regulatory sites determined by chromatin conformation capture data. Using an extended feature space representation, the INVOKE model can investigate the regulatory influence of TFs bound to promoters and enhancers separately.
14+
1315
10.10.2018: TEPIC 2.0 is now published in [Bioinformatics](https://doi.org/10.1093/bioinformatics/bty856).
1416

1517
13.08.2018: In addition to the gene-centric annotation, the functionality for transcript based annotation has been added.
@@ -156,6 +158,11 @@ Here, thresholded TF affinities are used for the computation.
156158
GENEID TF1 TF2 ... TFn peak length peak count peak signal
157159
ENSG00000044612 0 0 ... 4.2 23 3 19.2
158160

161+
The *Prefix_Conformation_Data_Affinity_Three_Peak_Based_Features_Gene_View.txt* files are based on the previous structure but extend it by including the same features, that is TF gene-scores and peak features determined for DHS residing in chromatin loops:
162+
163+
GENEID TF1 TF2 ... TFn peak length peak count peak signal LR_TF1 ... LR_TFn LR_peak length LR_peak count LR_peak signal
164+
ENSG00000044612 0 0 ... 4.2 23 3 19.2 3.4 ... 0.9 14 4 63.3
165+
159166
The *Prefix_Thresholded_Sparse_Affinity_Gene_View.txt* files are tab separated files listing the Ensemble GeneID in the first column, and the name of the TF associated to this gene in the second column.
160167
Here, thresholded TF affinities are used for the computation. The third column of this file is required by DREM and does not carry any specific meaning.
161168

docs/Description.pdf

2.2 KB
Binary file not shown.

docs/Description.tex

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
\item Annotation of user defined regions with TF affinities using TRAP and a variety of provided TF-motifs,
2222
\item Aggregation of TF affinities to TF-gene scores,
2323
\item Computation of statistical scores such as peak-length, peak-count or peak-signal per gene,
24+
\item Inclusion of long range chromatin contacts,
2425
\item Discretization of continuous TF affinities using a background distribution into a binary measure for TF-binding,
2526
\item Linear regression analysis to infer key transcriptional regulators within one sample,
2627
\item Logistic regression classifier to suggest key transcriptional regulators between samples,
@@ -198,7 +199,6 @@ \subsection{Computing TF gene scores}
198199

199200
Furthermore, TEPIC can compute a TF-specific affinity cut-off derived from either user-defined, or randomly generated sequences, to distinguish likely bound sites from unbound sites. These scores
200201
can be used to come-up with a binary TF-gene assignment. Further details on this mode are provided in Section \ref{EPIC-DREM}.
201-
202202
\begin{figure}[h!]
203203
\begin{center}
204204
\includegraphics[width=\textwidth]{Workflow.png}
@@ -211,8 +211,18 @@ \subsection{Computing TF gene scores}
211211
\label{workflowFig}
212212
\end{figure}
213213

214+
With version $2.2$ of TEPIC, we introduced support for the inclusion of long range chromatin conformation capture data. In addition to the promoter centric windows used before, we
215+
calculate TF affinities $a_{g,i}*$ and peak scores $pl_g*, pc_g*, ps_g*$ for all DHSs residing in genomic loci looping into the promoter region of a gene, summarizied in $P_{g,V_g}$, where $V_g$ is the set of all regions looped into the promoter region of gene $g$:
216+
\begin{align}
217+
a_{g,i}&=\sum_{p \in P_{g,V_g}} a_{p,i},\\
218+
pl_g*&=\sum_{p \in P_{g,V_g}}|p|, \\
219+
pc_g*&=\sum_{p \in P_{g,V_g}}, \\
220+
ps_g*&=\sum_{p \in P_{g,V_g}}s_{p}.
221+
\end{align}
222+
Note that scores computed for $p \in P_{g,V_g}g$ are never considering the exponential decay as a direct interaction of the respective sites with the promoter region of gene $g$ has been determined by chromatin conformation capture experiments.
223+
214224
\subsection{Required input}
215-
To compute TF gene scores a user needs to specify:
225+
To compute TF gene scores, a user needs to specify:
216226
\begin{itemize}
217227
\item a reference genome (-g option),
218228
\item a set of \textit{PSEMs} (-p option),

0 commit comments

Comments
 (0)