Modelling gene expression in terms of DNA sequence
Date Issued
2023
Author(s)
Relić, Đorđe
Abstract
Understanding the gene regulatory networks that control gene expression remains one of the most of important questions in molecular biology. Much of gene expression is controlled through transcription initiation, whose regulation is ultimately encoded in the constellations of small sequence motifs in the DNA that are bound by transcription factors (TFs) in a sequence-specific manner. In this thesis, we addressed the task of understanding gene regulation on two levels. Firstly, we present a computational pipeline for inferring a set of gene regulatory elements in a given organism which includes identifying genes that encode DNA-binding domains (DBDs), mapping them to known binding motifs by leveraging similarity in DBDs between species, annotating promoter regions genome-wide, aligning promoters with orthologous regions from related genomes, and predicting genome-wide transcription factor binding sites (TFBSs). We demonstrated the use of our pipeline by applying it to zebrafish. Furthermore, we integrated these results into our previously developed Integrated System for Motif Activity Response Analysis (ISMARA) which models gene expression data in terms of predicted regulatory sites. Using ISMARA, we predicted known and novel key regulatory TFs in zebrafish using a number of RNA-seq datasets. Secondly, we zoom in at the scale of one single TF regulating a set of constitutive promoters in \textit{Escherichia coli}. We analyzed an artificially evolved set of synthetic promoter sequences which are selected for expression constitutive promoters regulated by $\sigma^{70}$ transcription factor. We looked closely into promoter sequences and TF binding dynamics and investigated the predictive power of TF binding affinity on gene expression.
File(s)![Thumbnail Image]()
Loading...
Name
DordeRELIC_PhD_thesis_edoc_version.pdf
Size
32.52 MB
Format
Adobe PDF
Checksum
(MD5):aaaf81198358ec967e972c5adc54a174