Embrace by Anargyros Sarafopoulos   EVOWORKSHOPS:   EVOBIO2004

2nd European Workshop on Evolutionary Bioinformatics


8 co-located events:


previous events

travel tips
local information
invited speakers
AV-poster info
best papers
social events
live experiment

evoBIO chairs
Dave W. Corne
Elena Marchiori
local chair
Ernesto Costa



21 November 2003
19 December 2003
Camera ready
16 January 2004
5-7 April 2004

EvoBIO is the second workshop organized by the EvoNet working group on bioinformatics. EvoBIO covers research in all aspects of computational intelligence in bioinformatics and computational biology. The emphasis is on algorithms based on evolutionary computation, on neural networks and on other novel optimisation and machine learning methods, that address important problems in molecular biology, genomics and genetics, that are computationally efficient, and that have been implemented and tested in simulations and on real datasets. Workshop paper will be presented orally and will be published by Springer as part of EvoWorkshops2004 in the Lecture Notes in Computer Science series. Springer Lecture Notes in Computer Science series
Accepted papers of the first edition of EvoBIO were published in the Springer Verlag LNCS 2611.

LNCS 3005, the EvoWorkshops2004 proceedings, is now available online


Workshop PAPERS:

A Memetic Algorithm for Protein Structure Prediction in a 3D-Lattice HP Model
Andrea Bazzoli, Andrea G. B. Tettamanzi

This paper presents a memetic algorithm with self-adaptive local search, applied to protein structure prediction in an HP, cubic-lattice model. Besides describing in detail how the algorithm works, we report experimental results that justify important implementation choices, such as the introduction of speciation mechanisms and the extensive application of local search. Test runs on 48-mer chains show that the proposed algorithm has promising search capabilities.

An Improved Genetic Algorithm for the Sequencing by Hybridization Problem
Carlos A.Brizuela, Luis C.\ González, Heidi J. Romero

This paper presents a genetic algorithm for a computational biology problem. The problem appears in the computational part of a new deoxyribonucleic acid (DNA) sequencing procedure denominated sequencing by hybridization (SBH). The proposed genetic algorithm is an improvement over a recently proposed algorithm in the literature. The improvement is achieved by modifying the crossover operator towards an almost deterministic greedy crossover which makes the algorithm both more effective and more efficient. Experimental results on real DNA data are presented to show the advantages of using the proposed algorithm.

Evolutionary Search of Thresholds for Robust Feature Set Selection: Application to the Analysis of Microarray Data
Carlos Cotta, Christian Sloper, Pablo Moscato

We deal with two important problems in pattern recognition that arise in the analysis of large datasets. While most feature subset selection methods use statistical techniques to preprocess the labeled datasets, these methods are generally not linked with the combinatorial properties of the final solutions. We prove that it is $NP-$hard to obtain an appropriate set of thresholds that will transform a given dataset into a binary instance of a robust feature subset selection problem. We address this problem using an evolutionary algorithm that learns the appropriate value of the thresholds. The empirical evaluation shows that robust subset of genes can be obtained. This evaluation is done using real data corresponding to the gene expression of lymphomas.

Evolving Regular Expression-based Sequence Classifiers for Protein Nuclear Localisation
Amine Heddad, Markus Brameier, Robert MacCallum

A number of bioinformatics tools use regular expression (RE) matching to locate protein or DNA sequence motifs that have been discovered by researchers in the laboratory. For example, patterns representing nuclear localisation signals (NLSs) are used to predict nuclear localisation. NLSs are not yet well understood, and so the set of currently known NLSs may be incomplete. Here we use genetic programming (GP) to generate RE-based classifiers for nuclear localisation. While the approach is a supervised one (with respect to protein location), it is unsupervised with respect to already-known NLSs. It therefore has the potential to discover new NLS motifs. We apply both tree-based and linear GP to the problem. The inclusion of predicted secondary structure in the input does not improve performance. Benchmarking shows that our majority classifiers are competitive with existing tools. The evolved REs are usually "NLS-like" and work is underway to analyse these for novelty.

Analysis of Proteomic Pattern Data for Cancer Detection
Kees Jong, Elena Marchiori, Aad van der Vaart

In this paper we analyze two proteomic pattern datasets containing measurement s from ovarian and prostate cancer samples. In particular, a linear and a quadratic support vector machine (SVM) are applied to the data for distinguishing between cancer and benign status. On the ovarian dataset SVM gives excellent results, while the prostate dataset seems to be a harder classification problem for SVM. The prostate dataset is futher analyzed by means of an evolutionary algorithm for feature selection (EAFS) that searches for small subsets of features in order to optimize the SVM performance. In general, the subsets of features generated by EAFS vary over different runs and over different data splitting in training and hold-out sets. Nevertheless, particular features occur more frequently over all the runs. The role of these ``core'' features as potential tumor biomarkers deserves further study.

Self-adaptive Scouting---Autonomous Experimentation for Systems Biology
Naoki Matsumaru, Florian Centler, Klaus-Peter Zauner, Peter Dittrich

We introduce a new algorithm for autonomous experimentation. This algorithm uses evolution to drive exploration during scientific discovery. Population size and mutation strength are self-adaptive. The only variables remaining to be set are the limits and maximum resolution of the parameters in the experiment. In practice, these are determined by instrumentation. Aside from conducting physical experiments, the algorithm is a valuable tool for investigating simulation models of biological systems. We illustrate the operation of the algorithm on a model of HIV-immune system interaction. Finally, the difference between scouting and optimization is discussed.

An Improved Grammatical Evolution Strategy for Hierarchical Petri Net Modeling of Complex Genetic Systems
Jason Moore, Lance Hahn

DNA sequence variations impact human health through a hierarchy of biochemical and physiological systems. Understanding the hierarchical rela-tionships in the genotype-phenotype mapping is expected to improve the diag-nosis, prevention, and treatment of common, complex human diseases. We pre-viously developed a hierarchical dynamic systems approach based on Petri nets for generating biochemical network models that are consistent with genetic models of disease susceptibility. This strategy uses an evolutionary computation approach called grammatical evolution for symbolic manipulation and optimi-zation of Petri net models. We previously demonstrated that this approach rou-tinely identifies biochemical network models that are consistent with a variety of complex genetic models in which disease susceptibility is determined by nonlinear interactions between two DNA sequence variations. However, the modeling strategy was generally not successful when extended to modeling nonlinear interactions between three DNA sequence variations. In the present study, we evaluate a modified grammar for building Petri net models of bio-chemical systems that are consistent with high-order genetic models of disease susceptibility. The results indicate that our hierarchical model-building ap-proach is capable of identifying perfect Petri net models when an appropriate grammar is used.

Two-Step Genetic Programming for Optimization of RNA Common-Structure
Jin-Wu Nam, Je-Gun Joung, Young-Sirk Ahn, Byoung-Tak Zhang

We present an algorithm for identifying putative ncRNAs using an RCSG (RNA Common-Structural Grammar) and show the effectiveness of the algorithm. The algorithm consists of two steps: structure learning step and se-quence learning step. Both steps are based on genetic programming. Generally genetic programming has been applied to learn programs automatically, recon-struct networks, and predict protein secondary structures. In this study, we have applied genetic programming to optimize structural grammars. The structural grammars can be formulated by some rules as tree structure including function variables. They can be learned by genetic programming. We have defined the rules on how structure definition grammars can be encoded into function trees. The results we obtained from the experiments with RCSG of tRNA and 5S small rRNA and demonstrate the efficiency of our algorithm.

Evolutionary Algorithms for Optimal Control in Fed-batch Fermentation Processes
Miguel Rocha, JoséNeves, Isabel Rocha, Eugenio Ferreira

In this work, Evolutionary Algorithms (EAs) are used to achieve optimal feedforward control in a recombinant bacterial fed-batch fermentation process, that aims at producing a bio-pharmaceutical product. Three different aspects are the target of the optimization procedure: the feeding trajectory (the amount of substrate introduced in a bioreactor per time unit), the duration of the fermentation and the initial conditions of the process. A novel EA with variable size chromosomes and using real-valued representations is proposed that is capable of simultaneously optimizing the aforementioned aspects. Outstanding productivity levels were achieved and the results are validated by practice.

Discrete Branch Length Representations for Genetic Algorithms in Phylogenetic Search
Jian Shen, Robert Heckendorn

Likelihood analysis for phylogenetics is a well suited domain for optimization by genetic algorithm. A major factor in the performance of genetic algorithms is representation and operators. In this paper we present promising preliminary results on using a discrete set of edge lengths in the phylogenetic trees. We find that discretizing the edge lengths changes the fundamental character of the search and can produce higher quality trees. Our results suggest a search that is more robust to traps in local optima and an opportunity to better control the balance between topology search and edge length search.

Iteratively Inferring Gene Regulatory Networks with Virtual Knockout Experiments
Christian Spieth, Felix Streichert, Nora Speer, Andreas Zell

In this paper we address the problem of finding gene regulatory networks from experimental DNA microarray data. We introduce enhancements to an Evolutionary Algorithm optimization process to infer the parameters of the non-linear system given by the observed data more reliably and precisely. Due to the limited number of available data the inferring problem is under-determined and ambiguous. Further on, the problem often is multi-modal and therefore appropriate optimization strategies become necessary. Therefore, we propose a new method, which will suggest necessary additional biological experiments to remove the ambiguities.

Multiple Sequence Alignment Using SAGA: Investigating the Effects of Operator Scheduling, Population Seeding, and Crossover Operators
Rene Thomsen, Wouter Boomsma

Multiple sequence alignment (MSA) is a fundamental problem of great importance in molecular biology. In this study, we investigated several aspects of SAGA, a well-known evolutionary algorithm (EA) for solving MSA problems. The SAGA algorithm is important because it represents a successful attempt at applying EAs to MSA and since it is the first EA to use operator scheduling on this problem. However, it is largely undocumented which elements of SAGA are vital to its performance. An important finding in this study is that operator scheduling does not improve the performance of SAGA compared to a uniform selection of operators. Furthermore, the experiments show that seeding SAGA with a ClustalW-derived alignment allows the algorithm to discover alignments of higher quality compared to the traditional initialization scheme with randomly generated alignments. Finally, the experimental results indicate that SAGA's performance is largely unaffected when the crossover operators are disabled. Thus, the major determinant of SAGA's success seems to be the mutation operators and the scoring functions used.

Constructing Microbial Consortia with Minimal Growth Using a Genetic Algorithm
Frederik P. J. Vandecasteele, Thomas F. Hess, Ronald L. Crawford

The processes occurring in microbial ecosystems are typically governed by the actions of many different microorganisms that can all interact with each other in a highly nonlinear way. Historically, most work in microbiology has been focused on pure cultures of single organisms, while the study of groups of organisms (consortia) still forms a major challenge. Although enetic algorithms are capable of optimising noisy and nonlinear systems and they should therefore be very well suited for studying microbial ecology, they have only rarely been used in this field. In this work, a genetic algorithm was successfully used to construct microbial consortia exhibiting minimal growth from separate isolated fast growing strains. The technique developed here may open the way to new ecological insights.

EvoBIO programme committee:

Co-chair: Dave W. Corne, University of Exeter <d.w.corne@ex.ac.uk>
Co-chair: Elena Marchiori, Free University Amsterdam <elena@cs.vu.nl>
Jesus S. Aguilar-Ruiz, University of Seville (Spain)
Wolfgang Banzhaf, University of Dortmund (Germany)
Jacek Blazewicz, Institute of Computing Science, Poznan (Poland)
Carlos Cotta-Porras, University of Malaga (Spain)
Bogdan Filipic, Jozef Stefan Institute, Ljubljana (Slovenia)
David Fogel, Natural Selection, Inc. (USA)
Gary B. Fogel, Natural Selection, Inc. (USA)
James Foster, University of Idaho (USA)
Steven A. Frank, University of California, Irvine (USA)
Jin-Kao Hao, LERIA, Universite d'Angers, (France)
William Hart, Sandia National Labs (USA)
Jaap Heringa, Free University Amsterdam (The Netherlands)
Francisco Herrera, University of Granada (Spain)
Daniel Howard, QinetiQ (UK)
Antoine van Kampen, AMC University of Amsterdam (The Netherlands)
Douglas B. Kell, University of Wales (UK)
W.B. Langdon, UCL (UK)
Bob MacCallum, Stockholm University (Sweden)
Brian Mayoh, Aarhus University (Denmark)
Andrew C.R. Martin, University of Reading (UK)
Peter Merz, Eberhard-Karls-Universität Tübingen (Germany)
Martin Middendorf, Leipzig University (Germany)
Jason H. Moore, Vanderbilt University Medical Center (USA)
Pablo Moscato, The University of Newcastle (Australia)
Martin Oates, British Telecom Plc (UK)
Jon Rowe, University of Birmingham (UK)
Jem Rowland, Aberystwyth, The University of Wales (UK)
Vic J. Rayward-Smith, University of East Anglia (England)
El-ghazali Talbi, Laboratoire d'Informatique Fondamentalede Lille (France)
Eckart Zitzler, Swiss Federal Institute of Technology (Switzerland)

EvoWorkshops chairs:

Günther Raidl, Vienna University of Technology <raidl@ads.tuwien.ac.at>
Stefano Cagnoni, Universita' di Parma <cagnoni@ce.unipr.it>
Local chair : Ernesto Costa, University of Coimbra <ernesto@dei.uc.pt>

Workshop Background:

The goal of the workshop is to present recent research results, including significant work-in-progress, and to identify and explore directions of future research, besides stimulating closer interaction between members of this scientific community working on Bioinformatics. Each accepted paper will be presented orally at the workshop and printed in the proceedings published by Springer in the LNCS series.

Submission procedure (NOW CLOSED)

To submit, send your manuscript (postscript or PDF, maximum length: 10 A4 pages) via email to evobio@cs.vu.nl no later than 21 November (extended deadline) 2003. Formatting instructions can be found at http://www.springer.de/comp/lncs/authors.html. The papers will be peer reviewed by three members of the program committee. Authors will be notified via email on the results of the review by 19 December 2003.

The authors of accepted papers will have to improve their paper on the basis of the reviewers' comments and will be asked to send a camera ready version of their manuscripts by 16 January 2004. By submitting a camera-ready paper, the author(s) agree that at least one author will attend and present each accepted paper at the workshop.

Springer Lecture Notes in Computer Science series The EvoWorkshops2004 proceedings will be
published by Spinger as part of their
Lecture Notes in Computer Science series.