| EvoBIO
is the second workshop organized by the EvoNet working group on bioinformatics.
EvoBIO covers research in all aspects of computational intelligence in
bioinformatics and computational biology. The emphasis is on algorithms
based on evolutionary computation, on neural networks and on other novel
optimisation and machine learning methods, that address important problems
in molecular biology, genomics and genetics, that are computationally
efficient, and that have been implemented and tested in simulations and
on real datasets. Workshop paper will be presented orally and will be
published by Springer as part of EvoWorkshops2004 in the Lecture
Notes in Computer Science series. 
Accepted papers of the first edition of EvoBIO were published in the Springer
Verlag LNCS 2611.
LNCS 3005, the EvoWorkshops2004 proceedings, is now available online
Workshop
PAPERS:
A Memetic Algorithm for Protein Structure Prediction in a 3D-Lattice
HP Model
Andrea Bazzoli, Andrea G. B. Tettamanzi
Abstract:
This paper presents a memetic algorithm with self-adaptive local search,
applied to protein structure prediction in an HP, cubic-lattice model.
Besides describing in detail how the algorithm works, we report experimental
results that justify important implementation choices, such as the introduction
of speciation mechanisms and the extensive application of local search.
Test runs on 48-mer chains show that the proposed algorithm has promising
search capabilities.
An Improved Genetic Algorithm for the Sequencing by Hybridization
Problem
Carlos A.Brizuela, Luis C.\ González, Heidi J. Romero
Abstract:
This paper presents a genetic algorithm for a computational biology
problem. The problem appears in the computational part of a new deoxyribonucleic
acid (DNA) sequencing procedure denominated sequencing by hybridization
(SBH). The proposed genetic algorithm is an improvement over a recently
proposed algorithm in the literature. The improvement is achieved by
modifying the crossover operator towards an almost deterministic greedy
crossover which makes the algorithm both more effective and more efficient.
Experimental results on real DNA data are presented to show the advantages
of using the proposed algorithm.
Evolutionary Search of Thresholds for Robust Feature Set Selection:
Application to the Analysis of Microarray Data
Carlos Cotta, Christian Sloper, Pablo Moscato
Abstract:
We deal with two important problems in pattern recognition that arise
in the analysis of large datasets. While most feature subset selection
methods use statistical techniques to preprocess the labeled datasets,
these methods are generally not linked with the combinatorial properties
of the final solutions. We prove that it is $NP-$hard to obtain an appropriate
set of thresholds that will transform a given dataset into a binary
instance of a robust feature subset selection problem. We address this
problem using an evolutionary algorithm that learns the appropriate
value of the thresholds. The empirical evaluation shows that robust
subset of genes can be obtained. This evaluation is done using real
data corresponding to the gene expression of lymphomas.
Evolving Regular Expression-based Sequence Classifiers for Protein
Nuclear Localisation
Amine Heddad, Markus Brameier, Robert MacCallum
Abstract:
A number of bioinformatics tools use regular expression (RE) matching
to locate protein or DNA sequence motifs that have been discovered by
researchers in the laboratory. For example, patterns representing nuclear
localisation signals (NLSs) are used to predict nuclear localisation.
NLSs are not yet well understood, and so the set of currently known
NLSs may be incomplete. Here we use genetic programming (GP) to generate
RE-based classifiers for nuclear localisation. While the approach is
a supervised one (with respect to protein location), it is unsupervised
with respect to already-known NLSs. It therefore has the potential to
discover new NLS motifs. We apply both tree-based and linear GP to the
problem. The inclusion of predicted secondary structure in the input
does not improve performance. Benchmarking shows that our majority classifiers
are competitive with existing tools. The evolved REs are usually "NLS-like"
and work is underway to analyse these for novelty.
Analysis of Proteomic Pattern Data for Cancer Detection
Kees Jong, Elena Marchiori, Aad van der Vaart
Abstract:
In this paper we analyze two proteomic pattern datasets containing measurement
s from ovarian and prostate cancer samples. In particular, a linear
and a quadratic support vector machine (SVM) are applied to the data
for distinguishing between cancer and benign status. On the ovarian
dataset SVM gives excellent results, while the prostate dataset seems
to be a harder classification problem for SVM. The prostate dataset
is futher analyzed by means of an evolutionary algorithm for feature
selection (EAFS) that searches for small subsets of features in order
to optimize the SVM performance. In general, the subsets of features
generated by EAFS vary over different runs and over different data splitting
in training and hold-out sets. Nevertheless, particular features occur
more frequently over all the runs. The role of these ``core'' features
as potential tumor biomarkers deserves further study.
Self-adaptive Scouting---Autonomous Experimentation for Systems
Biology
Naoki Matsumaru, Florian Centler, Klaus-Peter Zauner, Peter Dittrich
Abstract:
We introduce a new algorithm for autonomous experimentation. This algorithm
uses evolution to drive exploration during scientific discovery. Population
size and mutation strength are self-adaptive. The only variables remaining
to be set are the limits and maximum resolution of the parameters in
the experiment. In practice, these are determined by instrumentation.
Aside from conducting physical experiments, the algorithm is a valuable
tool for investigating simulation models of biological systems. We illustrate
the operation of the algorithm on a model of HIV-immune system interaction.
Finally, the difference between scouting and optimization is discussed.
An Improved Grammatical Evolution Strategy for Hierarchical Petri
Net Modeling of Complex Genetic Systems
Jason Moore, Lance Hahn
Abstract:
DNA sequence variations impact human health through a hierarchy of biochemical
and physiological systems. Understanding the hierarchical rela-tionships
in the genotype-phenotype mapping is expected to improve the diag-nosis,
prevention, and treatment of common, complex human diseases. We pre-viously
developed a hierarchical dynamic systems approach based on Petri nets
for generating biochemical network models that are consistent with genetic
models of disease susceptibility. This strategy uses an evolutionary
computation approach called grammatical evolution for symbolic manipulation
and optimi-zation of Petri net models. We previously demonstrated that
this approach rou-tinely identifies biochemical network models that
are consistent with a variety of complex genetic models in which disease
susceptibility is determined by nonlinear interactions between two DNA
sequence variations. However, the modeling strategy was generally not
successful when extended to modeling nonlinear interactions between
three DNA sequence variations. In the present study, we evaluate a modified
grammar for building Petri net models of bio-chemical systems that are
consistent with high-order genetic models of disease susceptibility.
The results indicate that our hierarchical model-building ap-proach
is capable of identifying perfect Petri net models when an appropriate
grammar is used.
Two-Step Genetic Programming for Optimization of RNA Common-Structure
Jin-Wu Nam, Je-Gun Joung, Young-Sirk Ahn, Byoung-Tak Zhang
Abstract:
We present an algorithm for identifying putative ncRNAs using an RCSG
(RNA Common-Structural Grammar) and show the effectiveness of the algorithm.
The algorithm consists of two steps: structure learning step and se-quence
learning step. Both steps are based on genetic programming. Generally
genetic programming has been applied to learn programs automatically,
recon-struct networks, and predict protein secondary structures. In
this study, we have applied genetic programming to optimize structural
grammars. The structural grammars can be formulated by some rules as
tree structure including function variables. They can be learned by
genetic programming. We have defined the rules on how structure definition
grammars can be encoded into function trees. The results we obtained
from the experiments with RCSG of tRNA and 5S small rRNA and demonstrate
the efficiency of our algorithm.
Evolutionary Algorithms for Optimal Control in Fed-batch Fermentation
Processes
Miguel Rocha, JoséNeves, Isabel Rocha, Eugenio Ferreira
Abstract:
In this work, Evolutionary Algorithms (EAs) are used to achieve optimal
feedforward control in a recombinant bacterial fed-batch fermentation
process, that aims at producing a bio-pharmaceutical product. Three
different aspects are the target of the optimization procedure: the
feeding trajectory (the amount of substrate introduced in a bioreactor
per time unit), the duration of the fermentation and the initial conditions
of the process. A novel EA with variable size chromosomes and using
real-valued representations is proposed that is capable of simultaneously
optimizing the aforementioned aspects. Outstanding productivity levels
were achieved and the results are validated by practice.
Discrete Branch Length Representations for Genetic Algorithms
in Phylogenetic Search
Jian Shen, Robert Heckendorn
Abstract:
Likelihood analysis for phylogenetics is a well suited domain for optimization
by genetic algorithm. A major factor in the performance of genetic algorithms
is representation and operators. In this paper we present promising
preliminary results on using a discrete set of edge lengths in the phylogenetic
trees. We find that discretizing the edge lengths changes the fundamental
character of the search and can produce higher quality trees. Our results
suggest a search that is more robust to traps in local optima and an
opportunity to better control the balance between topology search and
edge length search.
Iteratively Inferring Gene Regulatory Networks with Virtual Knockout
Experiments
Christian Spieth, Felix Streichert, Nora Speer, Andreas Zell
Abstract:
In this paper we address the problem of finding gene regulatory networks
from experimental DNA microarray data. We introduce enhancements to
an Evolutionary Algorithm optimization process to infer the parameters
of the non-linear system given by the observed data more reliably and
precisely. Due to the limited number of available data the inferring
problem is under-determined and ambiguous. Further on, the problem often
is multi-modal and therefore appropriate optimization strategies become
necessary. Therefore, we propose a new method, which will suggest necessary
additional biological experiments to remove the ambiguities.
Multiple Sequence Alignment Using SAGA: Investigating the Effects
of Operator Scheduling, Population Seeding, and Crossover Operators
Rene Thomsen, Wouter Boomsma
Abstract:
Multiple sequence alignment (MSA) is a fundamental problem of great
importance in molecular biology. In this study, we investigated several
aspects of SAGA, a well-known evolutionary algorithm (EA) for solving
MSA problems. The SAGA algorithm is important because it represents
a successful attempt at applying EAs to MSA and since it is the first
EA to use operator scheduling on this problem. However, it is largely
undocumented which elements of SAGA are vital to its performance. An
important finding in this study is that operator scheduling does not
improve the performance of SAGA compared to a uniform selection of operators.
Furthermore, the experiments show that seeding SAGA with a ClustalW-derived
alignment allows the algorithm to discover alignments of higher quality
compared to the traditional initialization scheme with randomly generated
alignments. Finally, the experimental results indicate that SAGA's performance
is largely unaffected when the crossover operators are disabled. Thus,
the major determinant of SAGA's success seems to be the mutation operators
and the scoring functions used.
Constructing Microbial Consortia with Minimal Growth Using a
Genetic Algorithm
Frederik P. J. Vandecasteele, Thomas F. Hess, Ronald L. Crawford
Abstract:
The processes occurring in microbial ecosystems are typically governed
by the actions of many different microorganisms that can all interact
with each other in a highly nonlinear way. Historically, most work in
microbiology has been focused on pure cultures of single organisms,
while the study of groups of organisms (consortia) still forms a major
challenge. Although enetic algorithms are capable of optimising noisy
and nonlinear systems and they should therefore be very well suited
for studying microbial ecology, they have only rarely been used in this
field. In this work, a genetic algorithm was successfully used to construct
microbial consortia exhibiting minimal growth from separate isolated
fast growing strains. The technique developed here may open the way
to new ecological insights.
EvoBIO programme committee: Co-chair:
Dave W. Corne, University of Exeter <d.w.corne@ex.ac.uk>
Co-chair: Elena Marchiori, Free University Amsterdam
<elena@cs.vu.nl>
Jesus S. Aguilar-Ruiz, University of Seville (Spain)
Wolfgang Banzhaf, University of Dortmund (Germany)
Jacek Blazewicz, Institute of Computing Science, Poznan (Poland)
Carlos Cotta-Porras, University of Malaga (Spain)
Bogdan Filipic, Jozef Stefan Institute, Ljubljana (Slovenia)
David Fogel, Natural Selection, Inc. (USA)
Gary B. Fogel, Natural Selection, Inc. (USA)
James Foster, University of Idaho (USA)
Steven A. Frank, University of California, Irvine (USA)
Jin-Kao Hao, LERIA, Universite d'Angers, (France)
William Hart, Sandia National Labs (USA)
Jaap Heringa, Free University Amsterdam (The Netherlands)
Francisco Herrera, University of Granada (Spain)
Daniel Howard, QinetiQ (UK)
Antoine van Kampen, AMC University of Amsterdam (The Netherlands)
Douglas B. Kell, University of Wales (UK)
W.B. Langdon, UCL (UK)
Bob MacCallum, Stockholm University (Sweden)
Brian Mayoh, Aarhus University (Denmark)
Andrew C.R. Martin, University of Reading (UK)
Peter Merz, Eberhard-Karls-Universität Tübingen (Germany)
Martin Middendorf, Leipzig University (Germany)
Jason H. Moore, Vanderbilt University Medical Center (USA)
Pablo Moscato, The University of Newcastle (Australia)
Martin Oates, British Telecom Plc (UK)
Jon Rowe, University of Birmingham (UK)
Jem Rowland, Aberystwyth, The University of Wales (UK)
Vic J. Rayward-Smith, University of East Anglia (England)
El-ghazali Talbi, Laboratoire d'Informatique Fondamentalede Lille (France)
Eckart Zitzler, Swiss Federal Institute of Technology (Switzerland)
EvoWorkshops chairs: Günther
Raidl, Vienna University of Technology <raidl@ads.tuwien.ac.at>
Stefano Cagnoni, Universita' di Parma <cagnoni@ce.unipr.it>
Local chair : Ernesto Costa, University of Coimbra <ernesto@dei.uc.pt>
Workshop Background: The goal
of the workshop is to present recent research results, including significant
work-in-progress, and to identify and explore directions of future research,
besides stimulating closer interaction between members of this scientific
community working on Bioinformatics. Each accepted paper will be presented
orally at the workshop and printed in the proceedings published by Springer
in the LNCS series.
Submission procedure (NOW CLOSED)
To submit, send your manuscript (postscript or PDF, maximum length: 10
A4 pages) via email to evobio@cs.vu.nl
no later than 21 November (extended deadline) 2003. Formatting
instructions can be found at http://www.springer.de/comp/lncs/authors.html.
The papers will be peer reviewed by three members of the program committee.
Authors will be notified via email on the results of the review by 19
December 2003.
The authors of accepted papers will have to improve their paper on the
basis of the reviewers' comments and will be asked to send a camera ready
version of their manuscripts by 16 January 2004. By submitting a camera-ready
paper, the author(s) agree that at least one author will attend and present
each accepted paper at the workshop.
The EvoWorkshops2004 proceedings will be
published by Spinger as part of their
Lecture
Notes in Computer Science series.
|