PhD Thesis Defense: Rule-based Models of Transcriptional Regulation and Complex Diseases : Applications and Development

by Susanne Bornelöv

Abstract [en]

As we gain increased understanding of genetic disorders and gene regulation more focus has turned towards complex interactions. Combinations of genes or gene and environmental factors have been suggested to explain the missing heritability behind complex diseases. Furthermore, gene activation and splicing seem to be governed by a complex machinery of histone modification (HM), transcription factor (TF), and DNA sequence signals. This thesis aimed to apply and develop multivariate machine learning methods for use on such biological problems. Monte Carlo feature selection was combined with rule-based classification to identify interactions between HMs and to study the interplay of factors with importance for asthma and allergy.

Firstly, publicly available ChIP-seq data (Paper I) for 38 HMs was studied. We trained a classifier for predicting exon inclusion levels based on the HMs signals. We identified HMs important for splicing and illustrated that splicing could be predicted from the HM patterns. Next, we applied a similar methodology on data from two large birth cohorts describing asthma and allergy in children (Paper II). We identified genetic and environmental factors with importance for allergic diseases which confirmed earlier results and found candidate gene-gene and gene-environment interactions.

In order to interpret and present the classifiers we developed Ciruvis, a web-based tool for network visualization of classification rules (Paper III). We applied Ciruvis on classifiers trained on both simulated and real data and compared our tool to another methodology for interaction detection using classification. Finally, we continued the earlier study on epigenetics by analyzing HM and TF signals in genes with or without evidence of bidirectional transcription (Paper IV). We identified several HMs and TFs with different signals between unidirectional and bidirectional genes. Among these, the CTCF TF was shown to have a well-positioned peak 60-80 bp upstream of the transcription start site in unidirectional genes.

Full Text
Place, publisher, year, pages: Uppsala: Acta Universitatis Upsaliensis, 2014. 69 p.
Series: Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1167
Keywords [en]: Histone modification, Transcription factor, Transcriptional regulation, Next-generation sequencing, Feature selection, Machine learning, Rule-based classification, Asthma, Allergy
National Category: Bioinformatics and Systems Biology Bioinformatics (Computational Biology)
Research subject: Bioinformatics
Identifiers: urn:nbn:se:uu:diva-230159 (URN)978-91-554-9005-8 (ISBN)oai:DiVA.org:uu-230159 (OAI)
Public defence: 2014-10-03, BMC C8:301, Husargatan 3, Uppsala, 13:15 (English)
Opponent: Polanska, Joanna
Supervisors: Jan Komorowski and Claes Wadelius
Available from 2014-09-12; Created: 2014-08-19; Last updated: 2014-09-12
List of papers
  1. Combinations of histone modifications mark exon inclusion levels
  2. Rule-Based Models of the Interplay between Genetic and Environmental Factors in Childhood Allergy
  3. Ciruvis
  4. Different distribution of histone modifications in genes with unidirectional and bidirectional transcription and a role of CTCFand cohesin in directing transcription