EvoWorkshops20036th European Evolutionary Computing Workshops14-16 April 2003 Post conference pages 2004 announcement Awards Photos Feature articles Proceedings Main contacts EvoWorkshops chair Günther Raidl Local co-chairs Edward Tsang Riccardo Poli

# EvoBIO20031st European Workshop on Evolutionary Bioinformatics

## Introduction

EvoBIO2003 is the first workshop of the newly formed EvoNet working group on bioinformatics.

EvoBIO2003 covers research in all aspects of computational intelligence in bioinformatics and computational biology. The emphasis is on algorithms based on evolutionary computation, that address important problems in molecular biology, genomics and genetics, that are computationally efficient, and that have been implemented and tested in simulations and on real datasets. The goal of the workshop is to present recent research results, including significant work-in-progress, and to identify and explore directions of future research, besides stimulating closer interaction between members of this scientific community working on Bioinformatics.

Each accepted paper will be presented orally at the workshop and printed in the proceedings published by Springer in the LNCS series.

## Programme

Draft: subject to change

 Monday 14 April 0900-1000 Registration 1000-1115 EuroGP Session 1:Conference opening and invited speaker: David GoldbergSession chair: Terry Soule 1115-1130 Coffee break 1145-1300 Session 1:Microarray analysis 1Session chair: Jason Moore Inferring Gene Networks from Microarray Data using a Hybrid GA Levine J, Cumiskey M, Armstrong D Applying Memetic Algorithms to the Analysis of Microarray Data Cotta C, Mendes A, Garcia V, Franca P, Moscato P 1300-1400 Lunch 1400-1530 Session 2:Microarray analysis 2Session chair: Jem Rowland Chromosomal breakpoint detection in Human Cancer Jong K, Marchiori E, van der Vaart A, Ylstra B, Weiss M, Meijer G Artificial Immune System for Classification of Cancer Ando S, Hitoshi I Genetic algorithms for gene expression analysis Keedwell E, Narayanan A 1530-1600 Tea break 1600-1700 Session 3:MethodologySession chair: Bill Langdon Cross validation consistency for the assessment of genetic programming results in microarray studies Moore J Generalisation and model selection in supervised learning with evolutionary computation Rowland J 1730-1930 EuroGP Session 5:PostersSession chair: James Foster Assembling Strategies in Extrinsic Evolvable Hardware with Bidirectional Incremental Evolution Baradavkaigor I, Kalganova T Neutral Variations Cause Bloat in Linear GP Brameier M, Banzhaf W Experimental design based multi-parent crossover operator Chan K, Fogarty T An Enhanced Framework for Microprocessor Corno F, Squillero G The Effect of Plagues in Genetic Programming: A Study of Variable-Size Populations Fernandez F, Vanneschi L, Tomassini M Multi Niche Parallel GP with a Junk-code Migration Model Garcia S, Levine J, Gonzalez F Tree Adjoining Grammars, Language Bias, and Genetic Programming Xuan N, McKay R, Abbass H Artificial Immune System Programming for Symbolic Regression Johnson C Grammatical Evolution with Bidirectional Representation Kubalik J, Koutni J, Rothkrantz L Introducing a Perl Genetic Programming System: and Can Meta-evolution Solve the Bloat Problem? MacCallum R Evolutionary Optimized Mold Temperature Control Strategies using a Multi-Polyline Approach Mehnen J, Michelitsch T, Weinert K Genetic Programming for Attribute Construction in Data Mining Otero F, Silva M, Freitas A, Nievola J Sensible Initialisation in Chorus Ryan C, Atif R An Analysis of Diversity of Constants of Genetic Programming Ryan C, Keijzer M Research of a cellular automaton simulating logic gates by evolutionary algorithms Sapin E, Bailleux O, Chabrier J From Implementations to a General Concept of Evolvable Machines Sekanina L Cooperative Evolution on the Intertwined Spirals Problem Soule T The Root Causes of Code Growth in Genetic Programming Streeter M Fitness Distance Correlation in Structural Mutation Genetic Programming Vanneschi L, Tomassini M, Collard P, Clergue M Disease modeling using Evolved Discriminate Function Werner J, Kalganova T No Free Lunch, Program Induction and Combinatorial Problems Woodward J, Neil J Tuesday 15 April 1000-1100 Session 4:Pattern discoverySession chair: Carlos Cotta Promoter recognition with a GP-automaton Howard D, Benson K Pattern Search in Molecules with FANS: Preliminary Results Blanco A, Pelta D, Verdegay J 1100-1120 Coffee break 1120-1250 Session 5:Optimisation in BioinformaticsSession chair: Dave Corne Algorithms for identification key generation and optimization. Reynolds A, Rayward-Smith V, de la Iglesia B, Wesselink J, Robert V, Boekhout T Comparison of AdaBoost and Genetic Programming for combining Neural Networks for Drug Discovery. Langdon W, Barrett S, Buxton B Discovering haplotypes in lynkage disequilibrium mapping with an adaptive genetic algorithm. Jourdan L, Dhaenens C, Talbi G Workshop close

## Accepted papers

The EvoWorkshops2003 proceedings will be
Lecture Notes in Computer Science series.

Algorithms for identification key generation and optimization.
Reynolds A, Rayward-Smith V, de la Iglesia B, Wesselink J, Robert V, Boekhout T
Algorithms for the automated creation of low cost identification keys are described and theoretical and empirical justifications are provided. The algorithms are shown to handle differing test costs, prior probabilities for each potential diagnosis and tests that produce uncertain results. The approach is then extended to cover situations where more than one measure of cost is of importance, by allowing tests to be performed in batches. Experiments are performed on a real-world case study involving the identification of yeasts.
EvoBIO Session 5: Optimisation in Bioinformatics: April 15, 1120-1250

Applying Memetic Algorithms to the Analysis of Microarray Data
Cotta C, Mendes A, Garcia V, Franca P, Moscato P
This work deals with the application of Memetic Algorithms to the Microarray Gene Ordering problem, a NP-hard problem with strong implications in Medicine and Biology. It consists in ordering a set of genes, grouping together the ones with similar behavior. We propose a MA, and evaluate the influence of several features, such as the intensity of local searches and the utilization of multiple populations, in the performance of the MA. We also analyze the impact of different objective functions on the general aspect of the solutions. The instances used for experimentation are extracted from the literature and represent real biological systems.
EvoBIO Session 1: Microarray analysis 1: April 14, 1145-1300

Artificial Immune System for Classification of Cancer
Ando S, Hitoshi I
This paper presents a method for cancer type classification based on microarray-monitored data. The method is based on artificial immune system(AIS), which utilizes immunological recognition for classification. The system evolutionarily selects important genes; optimize their weights to derive classification rules. This system was applied to gene expression data of acute leukemia patients to classify their cancer class. The primary result found few classification rules which correctly classified all the test samples and gave some interesting implications for feature selection principles.
EvoBIO Session 2: Microarray analysis 2: April 14, 1400-1530

Chromosomal breakpoint detection in Human Cancer
Jong K, Marchiori E, van der Vaart A, Ylstra B, Weiss M, Meijer G
Chromosomal aberrations are differences in DNA sequence copy number of chromosome regions \footnote{Aberrations can occur without change of copy number, but these aberrations are not the subject of this paper.}. These differences may be crucial genetic events in the development and progression of human cancers. Array Comparative Genomic Hybridization is a laboratory method used in cancer research for the measurement of chromosomal aberrations in tumor genomes. A recurrent aberration at a particular genome location may indicate the presence of a tumor suppressor gene or an oncogene. The goal of the analysis of this type of data includes detection of locations of copy number changes, called breakpoints, and estimate of the values of the copy number value before and after a change. Knowing the exact locations of a breakpoint is import ant to identify possibly damaged genes. This paper introduces genetic local search algorithms to perform this task.
EvoBIO Session 2: Microarray analysis 2: April 14, 1400-1530

Comparison of AdaBoost and Genetic Programming for combining Neural Networks for Drug Discovery.
Langdon W, Barrett S, Buxton B
Genetic programming (GP) based data fusion and AdaBoost can both improve {\em in vitro} prediction of Cytochrome P450 activity by combining artificial neural networks (ANN). Pharmaceutical drug design data provided by high throughput screening (HTS) is used to train many base ANN classifiers. In data mining (KDD) we must avoid over fitting. The ensembles do extrapolate from the training data to other unseen molecules. I.e.\ they predict inhibition of a P450 enzyme by compounds unlike the chemicals used to train them. Thus the models might provide {\em in silico} screens of virtual chemicals as well as physical ones from Glaxo\allowbreak SmithKline (GSK)'s cheminformatics database. The receiver operating characteristics (ROC) of boosted and evolved ensemble are given.
EvoBIO Session 5: Optimisation in Bioinformatics: April 15, 1120-1250

Cross validation consistency for the assessment of genetic programming results in microarray studies
Moore J
DNA microarray technology has made it possible to measure the expression levels of thousands of genes simultaneously in a particular cell or tissue. The challenge for computational biologists and bioinformaticists will be to develop methods that are able to identify subsets of gene expression variables and features that classify cells and tissues into meaningful biological and clinical groups. Genetic programming (GP) has emerged as a machine learning tool for variable and feature selection in microarray data analysis. However, a limitation of GP is a lack of cross validation strategies for the assessment of GP results. This is partly due to the inherent complexity of GP due to its stochastic properties. Here, we introduce and review cross validation consistency (CVC) as a new modeling strategy for use with GP. We review the application of CVC to symbolic discriminant analysis (SDA), a GP-based analytical strategy for mining gene expression patterns in DNA microarray data.
EvoBIO Session 3: Methodology: April 14, 1600-1700

Discovering haplotypes in lynkage disequilibrium mapping with an adaptive genetic algorithm.
Jourdan L, Dhaenens C, Talbi G
In this paper, we present an evolutionary approach to discover candidate haploty pes in a linkage disequilibrium study. This work takes place into the study of f actors involved in multi-factorial diseases such as diabetes and obesity. A firs t study on the linkage disequilibrium problem structure led us to use a genetic algorithm to solve it. Due to the particular, but classical, evaluation function given by the biologists, we design our genetic algorithm with several populatio ns. This model lead us to implement different cooperative operators such as muta tion and crossover. Probabilities of application of those mechanisms are set ada ptively. In order to introduce some diversity, we also implement a random immigr ant strategy and to cover up the cost of the evaluation computation we paralleli ze it in a master / slave model. Different combinations of the presented mechani sms are tested on real data and compared in term of robustness and computation c ost. We show that the most complete strategy is able to find the best solutions and is the most robust.
EvoBIO Session 5: Optimisation in Bioinformatics: April 15, 1120-1250

Generalisation and model selection in supervised learning with evolutionary computation
Rowland J
EC-based supervised learning has been demonstrated to be an effective approach to forming predictive models in genomics, spectral interpretation, and other problems in modern biology. Longer-established methods such as PLS and ANN are also often successful. In supervised learning, overtraining is always a potential problem. The literature reports numerous methods of validating predictive models in order to avoid overtraining. Some of these approaches can be applied to EC-based methods of supervised learning, though the characteristics of EC learning are different from those obtained with PLS and ANN and selecting a suitably general model can be more difficult. This paper reviews the issues and various approaches, illustrating salient points with examples taken from applications in bioinformatics.
EvoBIO Session 3: Methodology: April 14, 1600-1700

Genetic algorithms for gene expression analysis
Keedwell E, Narayanan A
The major problem for current gene expression analysis techniques is how to identify the handful of genes which contribute to a disease from the thousands of genes measured on gene chips (microarrays). The use of a novel neural-genetic hybrid algorithm for gene expression analysis is described here. The genetic algorithm identifies possible gene combinations for classification and then uses the output from a neural network to determine the fitness of these combinations. Normal mutation and crossover operations are used to find increasingly fit combinations. Experiments on artificial and real-world gene expression databases are reported. The results from the algorithm are also explored for biological plausibility and confirm that the algorithm is a powerful alternative to standard data mining techniques in this domain.
EvoBIO Session 2: Microarray analysis 2: April 14, 1400-1530

Inferring Gene Networks from Microarray Data using a Hybrid GA
Levine J, Cumiskey M, Armstrong D
With the first draft completion of multiple organism genome sequencing programmes the emphasis is now moving toward a functional understanding of these genes and their network interactions. Microarray technology allows for large-scale gene experimentation. Using this technology it is possible to find the expression levels of genes across different conditions. The use of a genetic algorithm with a backpropagation local searching mechanism to reconstruct gene networks was investigated. This study demonstrates that the distributed genetic algorithm approach shows promise in that the method can infer gene networks that fit test data closely. Evaluating the biological accuracy of predicted networks from currently available test data is not possible. The best that can be achieved is to produce a set of possible networks to pass to a biologist for experimental verification.
EvoBIO Session 1: Microarray analysis 1: April 14, 1145-1300

Pattern Search in Molecules with FANS: Preliminary Results
Blanco A, Pelta D, Verdegay J
We show here how \FANS, a fuzzy sets-based heuristic, is applied to a particular case of the Molecular Structure Matching problem: given two molecules $A$ (the pattern) and $B$ (the target) we want to find a subset of points of $B$ whose set of intra-atomic distances is the most similar to that of $A$. This is a hard combinatorial problem because, first we have to determine a subset of atoms of $B$ and then some order for them has to be established. We analyze how the size of the pattern affects the performance of the heuristic, thus obtaining guidelines to approach the solution of real problems in the near future.
EvoBIO Session 4: Pattern discovery: April 15, 1000-1100

Promoter recognition with a GP-automaton
Howard D, Benson K
A GP-automaton evolves motif sequences for its states; it moves the point of motif application at transition time using an integer that is stored and evolved in the transition; and it combines motif matches via logical functions that it also stores and evolves in each transition. This scheme learns to predict promoters in human genome. The experiments reported use 5-fold cross validation.
EvoBIO Session 4: Pattern discovery: April 15, 1000-1100

## Chairs

Dave W. Corne
School of Computer Science, Cybernetics
http://www.personal.rdg.ac.uk/~ssscorne/
phone: +44 118 931 8983
fax: +44 118 975 1994

Elena Marchiori
Department of Mathematics and Computer Science
Free University Amsterdam
de Boelelaan 1081a
1081 HV, Amsterdam, The Netherlands
elena@cs.vu.nl
http://www.cs.vu.nl/~elena
phone: +31(0)20 444 7738
fax: +31(0)20 444 7653

## Programme committee

Jesus S. Aguilar-Ruiz, University of Seville (Spain)
Wolfgang Banzhaf, University of Dortmund (Germany)
Jacek Blazewicz, Institute of Computing Science, Poznan (Poland)
Carlos Cotta-Porras, University of Malaga (Spain)
Bogdan Filipic, Jozef Stefan Institute, Ljubljana (Slovenia)
Gary B. Fogel, Natural Selection, Inc. (USA)
James Foster, University of Idaho (USA)
Steven A. Frank, University of California, Irvine (USA)
Jin-Kao Hao, LERIA, Universite d'Angers, (France)
William Hart, Sandia National Labs (USA)
Jaap Heringa, Free University Amsterdam (NL)
Francisco Herrera, University of Granada (Spain)
Daniel Howard, QinetiQ (UK)
Antoine van Kampen, AMC University of Amsterdam (The Netherlands)
Maarten Keijzer, Free University Amsterdam (The Netherlands)
Douglas B. Kell, University of Wales (UK)
W.B. Langdon, UCL (UK)
Bob MacCallum, Stockholm University (Sweden)
Brian Mayoh, Aarhus University (Denmark)
Andrew C.R. Martin, University of Reading (UK)
Peter Merz, Eberhard-Karls-Universität Tübingen (Germany)
Martin Middendorf, Catholic University Eichstatt-Ingolstadt (Germany)
Jason H. Moore, Vanderbilt University Medical Center (USA)
Pablo Moscato, The University of Newcastle (Australia)
Martin Oates, British Telecom Plc (UK)
Jon Rowe, University of Birmingham (UK)
Jem Rowland, Aberystwyth, The University of Wales (UK)
Vic J. Rayward-Smith, University of East Anglia (England)
El-ghazali Talbi, Laboratoire d'Informatique Fondamentale de Lille (France)
Eckart Zitzler, Swiss Federal Institute of Technology (Switzerland)