dc.contributor.author |
Valavanis, IK |
en |
dc.contributor.author |
Spyrou, GM |
en |
dc.contributor.author |
Nikita, KS |
en |
dc.date.accessioned |
2014-03-01T02:45:45Z |
|
dc.date.available |
2014-03-01T02:45:45Z |
|
dc.date.issued |
2008 |
en |
dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/32367 |
|
dc.subject |
Cross Validation |
en |
dc.subject |
Feature Selection |
en |
dc.subject |
Fold Recognition |
en |
dc.subject |
Genetic Algorithm |
en |
dc.subject |
Machine Learning |
en |
dc.subject |
Protein Sequence |
en |
dc.subject |
Protein Structure |
en |
dc.subject |
PSN |
en |
dc.subject |
Random Walk |
en |
dc.subject.other |
Cross validation |
en |
dc.subject.other |
Feature selection |
en |
dc.subject.other |
Fold recognition |
en |
dc.subject.other |
Machine learning techniques |
en |
dc.subject.other |
Protein sequences |
en |
dc.subject.other |
Protein structures |
en |
dc.subject.other |
Query proteins |
en |
dc.subject.other |
Random Walk |
en |
dc.subject.other |
Similarity network |
en |
dc.subject.other |
Testing sets |
en |
dc.subject.other |
Bioinformatics |
en |
dc.subject.other |
Genetic algorithms |
en |
dc.subject.other |
Learning algorithms |
en |
dc.subject.other |
Optimization |
en |
dc.subject.other |
Statistical tests |
en |
dc.subject.other |
Set theory |
en |
dc.title |
Protein similarity networks and genetic algorithm driven feature selection for fold recognition |
en |
heal.type |
conferenceItem |
en |
heal.identifier.primary |
10.1109/BIBE.2008.4696704 |
en |
heal.identifier.secondary |
http://dx.doi.org/10.1109/BIBE.2008.4696704 |
en |
heal.identifier.secondary |
4696704 |
en |
heal.publicationDate |
2008 |
en |
heal.abstract |
Fold recognition based on sequence-derived features is a complex classification problem and usuall sequence-derived features are exploited using proper machine learning techniques. Here we adress the task of fold recognition on a protein similarity network (PSN) basis. We construct a protein sequence similarity network (PSeSN) using a set of 125 sequence-derived features for an available set of 311 proteins.PSeSN is optimized by using a Genetic Algorithm (GA) to select the features that construct a PSeSN which is as similar as possible with the corresponding protein structure similarity network (PStSN). A random walk based algorithm is then utilized to recognize the fold of a query protein sequence by calculating its affinities to sequences-vertices both in the initial and the optimized PSeSN. Total accuracy (TA) measurements obtained using 10-fold cross validation show that the use of 48 out of 125 sequence-derived features (optimized PSeSN) yielded better results (mean TA: 0.35 in testing sets) than the initial PSeSN (mean TA: 0.316 in testing sets). |
en |
heal.journalName |
8th IEEE International Conference on BioInformatics and BioEngineering, BIBE 2008 |
en |
dc.identifier.doi |
10.1109/BIBE.2008.4696704 |
en |