Project

An automated workflow for ortholog informed generation of gene trees

We develop an open-source, automated workflow to generate high-quality gene trees to enable non-experts to study the evolution of genes of interest (relevant for evolution of a trait like a specific metabolite, antibiotic resistance, secondary product synthesis).

Background

Metagenomic sequencing has given us access to a before unthinkable abundance and diversity of the genomic and protein sequences. This wealth in sequence information of previously unsampled lineages should provide us with the opportunity to examine protein distribution and evolution in more detail than ever. However, due to the sheer amount of data, it has become difficult to harness this benefit without expert knowledge and access to high-performance computing.

Aims

We develop an open-source, automated workflow to generate high-quality gene trees to enable non-experts to study the evolution of genes of interest (relevant for evolution of a trait like a specific metabolite, antibiotic resistance, secondary product synthesis):

  1. Easy-to-use Snakemake workflow
  2. Robust phylogenetic analysis of target genes
  3. Efficient handling of large sequence datasets
  4. Reproducible treatment of datasets
  5. Run fast on personal computer

Techniques

  • Sequence similarity searches (BLAST, mmseqs2, Interproscan, etc.)
  • Multiple sequence alignment and phylogenetics
  • Python and snakemake workflow programming
  • Collaborative version control with git and github.

BSc/MSc theses

Thesis projects are available for BSc or MSc students with interest in evolutionary microbiology and bioinformatics. Experience with working on linux command line and programming (e.g. Python, R) is recommended.

Contact

Are you interested in working on this project? Please contact Stephan Köstlbacher (stephan.kostlbacher@wur.nl).