EvoWorkshops2005: EvoBIO

3rd European Workshop on Evolutionary Bioinformatics

In Applications of Evolutionary Computing.

EvoBIO is the third workshop organized by the EvoNet working group on bioinformatics. EvoBIO covers research in all aspects of computational intelligence in bioinformatics and computational biology. The emphasis is on algorithms based on evolutionary computation, on neural networks and on other novel optimisation and machine learning methods, that address important problems in molecular biology, genomics and genetics, that are computationally efficient, and that have been implemented and tested in simulations and on real datasets.

Organising Committee

Program Chairs
Dave W. Corne
d.w.corne AT ex DOT ac DOT uk
University of Exeter
 
Elena Marchiori
elena AT cs DOT vu DOT nl
Free University Amsterdam
 
EvoWorkshops2005 Chair
Franz Rothlauf
rothlauf AT uni-mannheim DOT de
University of Mannheim, Germany
 
Local Chair
Marco Tomassini
Marco.Tomassini AT hec DOT unil DOT ch
University of Lausanne, Switzerland
 
Publicity Chair
Jano van Hemert
jvhemert AT cwi DOT nl
Napier University, Edinburgh, Scotland, UK

Note: the e-mail addresses are masked for spam protection.

Programme Committee

Jesus S. Aguilar-Ruiz, University of Seville (Spain)
F.J. Azuaje, University of Ulster (Ireland)
Wolfgang Banzhaf, University of Dortmund (Germany)
Jacek Blazewicz, Institute of Computing Science, Poznan (Poland)
Carlos Cotta-Porras, University of Malaga (Spain)
Alfredo Ferro, University of Catania (Italy)
Bogdan Filipic, Jozef Stefan Institute, Ljubljana (Slovenia)
David Fogel, Natural Selection, Inc. (USA)
Gary B. Fogel, Natural Selection, Inc. (USA)
James Foster, University of Idaho (USA)
Steven A. Frank, University of California, Irvine (USA)
Alex Freitas, University of Kent (UK)
R. Giugno, University of Catania (Italy)
Jin-Kao Hao, LERIA, Universite d'Angers, (France)
William Hart, Sandia National Labs (USA)
Jaap Heringa, Free University Amsterdam (The Netherlands)
Francisco Herrera, University of Granada (Spain)
Daniel Howard, QinetiQ (UK)
V. Kadirkamanathan, University of Sheffield (UK)
Antoine van Kampen, AMC University of Amsterdam (The Netherlands)
Douglas B. Kell, University of Wales (UK)
W.B. Langdon, UCL (UK)
Bob MacCallum, Stockholm University (Sweden)
Brian Mayoh, Aarhus University (Denmark)
Andrew C.R. Martin, University of Reading (UK)
Peter Merz, Eberhard-Karls-Universit√§t T√ľbingen (Germany)
Martin Middendorf, Leipzig University (Germany)
Jason H. Moore, Vanderbilt University Medical Center (USA)
Pablo Moscato, The University of Newcastle (Australia)
A. Narayanan, University of Exeter (UK)
Martin Oates, British Telecom Plc (UK)
Jon Rowe, University of Birmingham (UK)
Jem Rowland, Aberystwyth, The University of Wales (UK)
G.C. Rajapakse, Nanyang Technological University (Singapore)
Vic J. Rayward-Smith, University of East Anglia (England)
El-ghazali Talbi, Laboratoire d'Informatique Fondamentalede Lille (France)
Eckart Zitzler, Swiss Federal Institute of Technology (Switzerland)


EvoBIO Programme

Wednesday, 30 March 2005

Session 1: Protein Analysis: alignment, assembling and structure prediction (1120-1250)

Chair: James Foster

A Class of Pareto Archived Evolution Strategy Algorithms using Immune inspired Operators for Ab-Initio Protein Structure Prediction
Vincenzo Cutello
Giuseppe Narzisi
Giuseppe Nicosia
(Best Paper Award Candidate)

A Fuzzy Viterbi Algorithm for Improved Sequence Alignment and Searching
of Proteins
Niranjan P. Bidargaddi
Madhu Chetty
Joarder Kamruzzaman

Tabu search method for determining sequences of amino acids in long
polypeptides
Jacek Blazewicz
Marcin Borowski
Piotr Formanowicz
Maciej Stobiecki

Session 2: Protein Data Analysis: prediction, metabolic networks (1415-1545)

Chair: Aijt Narayanan

Syntactic Approach to Predict Membrane Spanning Regions of Transmembrane Proteins
Koliya Pulasinghe
Jagath Rajapakse

An evolutionary approach for motif discovery and transmembrane protein classification
Denise Fukumi Tsunoda
Heitor Silverio Lopes
Alex Alves Freitas

Differential Evolution and Its Application to Metabolic Flux Analysis
Jing Yang
Sarawan Wongsa
Visakan Kadirkamanathan
Stephen A. Billings
Phillip C. Wright
(Best Paper Award Candidate)

Session 3: Gene Expression Data Analysis: classification and biomarker detection (1600-1730)

Chair: Carlos Cotta

Neural networks and temporal gene expression data
Abhay Krishna
Ajit Narayanan
Ed Keedwell

Can neural network constraints in GP provide power to detect genes associated with human disease?
William S. Bush
Alison A. Motsinger
Scott M. Dudek
Marylyn D. Ritchie

Bayesian learning with local support vector machines for cancer classification with gene expression data
Elena Marchiori
Michèle Sebag

Thursday, 31 March 2005

Session 4: Microarray Data Analysis: clustering and search (0930-1100)

Chair: Marylyn Ritchie

Evolutionary Biclustering of Microarray Data
Jesus S. Aguilar-Ruiz
Federico Divina

Order Preserving Clustering over Multiple Time Course Experiment
Stefan Bleuler
Eckart Zitzler
(Best Paper Award Candidate)

Genes Related with Alzheimer's Disease: A Comparison of Evolutionary Search, Statistical and Integer Programming Approaches
Mou'ath Hourani
Alexandre Mendes
Carlos Cotta

Session 5: Structure Activity Relationship, Panel Discussion (1120-1250)

Chair: Jason Moore

GEMPLS: A New QSAR Method Combining Generic Evolutionary Method and Partial Least Squares
Yen-Chih Chen
Jinn-Moon Yang
Chi-Hung Tsai
Cheng-Yan Kao

PANEL DISCUSSION:
Advanced computational methods in bioinformatics: salient issues


EvoBIO: Titles and abstracts of accepted papers

Jesus S. Aguilar-Ruiz
Federico Divina

Evolutionary Biclustering of Microarray Data

In this work, we address the biclustering of gene expression data with evolutionary computation, which has been proven to have excellent performance on complex problems. In expression data analysis, the most important goal may not be finding the maximum bicluster, as it might be more interesting to find a set of genes showing similar behavior under a set of conditions. Our approach is based on evolutionary algorithms and searches for biclusters following a sequential covering strategy. In addition, we pay special attention to the fact of looking for high quality biclusters with large variation. The quality of biclusters found by our approach is discussed by means of the analysis of yeast and colon cancer datasets.


Niranjan P. Bidargaddi
Madhu Chetty
Joarder Kamruzzaman

A Fuzzy Viterbi Algorithm for Improved Sequence Alignment and Searching of Proteins

Profile HMMs based on classical hidden Markov models have been widely studied for identification of members belonging to protein sequence families. Classical Viterbi search algorithm which has been used traditionally to calculate log-odd scores of the alignment of a new sequence to a profile model is based on the probability theory. To overcome the limitations of the classical HMM and for achieving an improved alignment and better log-odd scores for the sequences belonging to a given family, we propose a fuzzy Viterbi search algorithm which is based on Choquet integrals and Sugeno fuzzy measures. The proposed search algorithm incorporates ascending values of the scores of the neighboring states while calculating the scores for a given state, hence providing better alignment and improved log-odd scores. The proposed fuzzy Viterbi algorithm for profiles along with classical Viterbi search algorithm has been tested on globin and kinase families. The results obtained in terms of log-odd scores, Z-scores and other statistical analysis establish the superiority of fuzzy Viterbi search algorithm.


Jacek Blazewicz
Marcin Borowski
Piotr Formanowicz
Maciej Stobiecki

Tabu search method for determining sequences of amino acids in long polypeptides

The amino acid sequences of proteins determine their structure and functionality, hence methods for reading such sequences are crucial for many areas of biological sciences. Since direct methods for reading amino acid sequences allow for determining only very short fragments, some methods for assembly of these fragments are required. In this paper, tabu search algorithm solving this problem is proposed. Computational tests show its usefulness in the process of determining sequences of amino acids in long polypeptides.


Stefan Bleuler
Eckart Zitzler

Order Preserving Clustering over Multiple Time Course Experiments

(Best Paper Award Candidate)
Clustering still represents the most commonly used technique to analyze gene expression data—be it classical clustering approaches that aim at finding biologically relevant gene groups or biclustering methods that focus on identifying subset of genes that behave similarly over a subset of conditions. Usually, the measurements of different experiments are mixed together in a single gene expression matrix, where the information about which experiments belong together, e.g., in the context of a time course, is lost. This paper investigates the question of how to exploit the information about related experiments and to effectively use it in the clustering process. To this end, the idea of order preserving clusters that has been presented in [BCKY2002a] is extended and integrated in an evolutionary algorithm framework that allows simultaneous clustering over multiple time course experiments while keeping the distinct time series data separate.


William S. Bush
Alison A. Motsinger
Scott M. Dudek
Marylyn D. Ritchie

Can neural network constraints in GP provide power to detect genes associated with human disease?

A major goal of human genetics is the identification of susceptibility genes associated with common, complex diseases. Identifying gene-gene and gene-environment interactions which comprise the genetic architecture for a majority of common diseases is a difficult challenge. To this end, novel computational approaches have been applied to studies of human disease. Previously, a GP neural network (GPNN) approach was employed. Although the GPNN method has been quite successful, a clear comparison of GPNN and GP alone to detect genetic effects has not been made. In this paper, we demonstrate that using NN evolved by GP can be more powerful than GP alone. This is most likely due to the confined search space of the GPNN approach, in comparison to a free form GP. This study demonstrates the utility of using GP to evolve NN in studies of the genetics of common, complex human disease.


Vincenzo Cutello
Giuseppe Narzisi
Giuseppe Nicosia

A Class of Pareto Archived Evolution Strategy Algorithms using Immune inspired Operators for Ab-Initio Protein Structure Prediction

(Best Paper Award Candidate)
In this work we investigate the applicability of a multiobjective formulation of the Ab-Initio Protein Structure Prediction (PSP) to medium size protein sequences (46-70 residues). In particular, we introduce a modified version of Pareto Archived Evolution Strategy (PAES) which makes use of immune inspired computing principles and which we will denote by “I-PAES”. Experimental results on the test bed of five proteins from PDB show that PAES, (1+1)-PAES and its modified version I-PAES, are optimal multiobjective optimization algorithms and the introduced mutation operators, and are effective for the PSP problem. The proposed I-PAES is comparable with other evolutionary algorithms proposed in literature, both in terms of best solution found and computational cost.


Abhay Krishna
Ajit Narayanan
E.C. Keedwell

Neural networks and temporal gene expression data

Temporal gene expression data is of particular interest to systems biology researchers. Such data can be used to create gene networks, where such networks represent the regulatory interactions between genes over time. Reverse engineering gene networks from temporal gene expression data is one of the most important steps in the study of complex biological systems. This paper introduces sensitivity analysis of systematically perturbed trained neural networks to both select a smaller and more influential subset of genes from a temporal gene expression dataset as well as reverse engineer a gene network from the reduced temporal gene expression data. The methodology was applied to the rat cervical spinal cord development time-course data, and it is demonstrated that the method not only identifies important genes involved in regulatory relationships but also generates candidate gene networks for further experimental study.


Elena Marchiori
Michèle Sebag

Bayesian learning with local support vector machines for cancer classification with gene expression data

This paper describes a novel method for (binary) classification based on support vector machines (SVM) and its application to cancer classification with gene expression data. The method employs pairs of support vectors of a (linear) SVM classifier for generating a sequence of new SVM classifiers, called local support classifiers. This sequence is used in two Bayesian learning techniques: as ensemble of classifiers in Optimal Bayes, and as attributes in Naive Bayes. The resulting classifiers are applied to four publically available gene expression datasets from leukemia, ovarian, lymphoma, and colon cancer data, respectively. The results indicate that the proposed approach improves significantly the predictive performance of the baseline SVM classifier, its stability and robustness, with excellent results on all datasets. In particular, perfect classification is achieved on the colon cancer dataset.


Pablo Moscato
Regina Berretta
Mou'ath Hourani
Alexandre Mendes
Carlos Cotta

Genes Related with Alzheimer’s Disease: A Comparison of Evolutionary Search, Statistical and Integer Programming Approaches

Three different methodologies have been applied to microarray data from brains of Alzheimer diagnosed patients and healthy patients taken as control. A clear pattern of differential gene expression results which can be regarded as a molecular signature of the disease. The results show the complementarity of the different methodologies, suggesting that a unified approach may help to uncover complex genetic risk factors not currently discovered with a single method. We also compare the set of genes in these differential patterns with those already reported in the literature.


Koliya Pulasinghe
Jagath C Rajapakse

Syntactic Approach to Predict Membrane Spanning Regions of Transmembrane Proteins

This paper exploits “biological grammar” of transmembrane proteins to predict their membrane spanning regions using hidden Markov models and elaborates a set of syntactic rules to model the distinct features of transmembrane proteins. This paves the way to identify the characteristics of membrane proteins analogous to the way that identifies language contents of speech utterances by using hidden Markov models. The proposed method correctly predicts 95.24


Denise Fukumi Tsunoda
Heitor Silverio Lopes
Alex Alves Freitas

An evolutionary approach for motif discovery and transmembrane protein classification

Proteins can be grouped into families according to their biological functions. This paper presents a system, named GAMBIT, which discovers motifs (particular sequences of amino acids) that occur very often in proteins of a given family but rarely occur in proteins of other families. These motifs are used to classify unknown proteins, that is, to predict their function by analyzing the primary structure. To search for motifs in proteins, we developed a GA with specially tailored operators for the problem. GAMBIT was compared with MEME, a web tool for finding motifs in the TransMembrane Protein DataBase. Motifs found by both methods were used to build a decision tree and classification rules, using, respectively, C4.5 and Prism algorithms. Motifs found by GAMBIT led to significantly better results, when compared with those found by MEME, using both classification algorithms.


Jing Yang
Sarawan Wongsa
Visakan Kadirkamanathan
Stephen A. Billings
Phillip C. Wright

Differential Evolution and Its Application to Metabolic Flux Analysis

(Best Paper Award Candidate)
Metabolic flux analysis with measurement data from 13C tracer experiments has been an important approach for exploring metabolic networks. Though various methods were developed for 13C positional enrichment or isotopomer modelling, few researchers have investigated flux estimation problem in detail. In this paper, flux estimation is formulated as a global optimization problem by carbon enrichment balances. Differential evolution, which is a simple and robust evolutionary algorithm, is applied to flux estimation. The algorithm performances are illustrated and compared with ordinary least squares estimation through simulation of the cyclic pentose phosphate metabolic network in a noisy environment. It is shown that differential evolution is an efficient approach for flux quantification.


Yen-Chih Chen
Jinn-Moon Yang
Chi-Hung Tsai
Cheng-Yan Kao

GEMPLS: A New QSAR Method Combining Generic Evolutionary Method and Partial Least Squares

We have proposed a new method for quantitative structure-activity relationship (QSAR) analysis. This tool, termed GEMPLS, combines a genetic evolutionary method with partial least squares (PLS). We designed a new genetic operator and used Mahalanobis distance to improve predicted accuracy and speed up a solution for QSAR. The number of latent variables (lv) was encoded into the chromosome of GA, instead of scanning the best lv for PLS. We applied GEMPLS on a comparative binding energy (COMBINE) analysis system of 48 inhibitors of the HIV-1 protease. Using GEMPLS, the cross-validated correlation coefficient (q2) is 0.9053 and external SDEP (SDEPex) is 0.61. The results indicate that GEMPLS is very comparative to GAPLS and GEMPLS is faster than GAPLS for this data set. GEMPLS yielded the QSAR models, in which selected residues are consistent with some experimental evidences.