dc.contributor.author |
Belsis, P |
en |
dc.contributor.author |
Fragos, K |
en |
dc.contributor.author |
Gritzalis, S |
en |
dc.contributor.author |
Skourlas, C |
en |
dc.date.accessioned |
2014-03-01T02:44:11Z |
|
dc.date.available |
2014-03-01T02:44:11Z |
|
dc.date.issued |
2006 |
en |
dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/31740 |
|
dc.subject |
Hierarchical systems of experts |
en |
dc.subject |
Machine learning |
en |
dc.subject |
Spam mail |
en |
dc.subject.other |
Adaptive filtering |
en |
dc.subject.other |
Algorithms |
en |
dc.subject.other |
Classification (of information) |
en |
dc.subject.other |
Electronic mail |
en |
dc.subject.other |
Learning systems |
en |
dc.subject.other |
Text processing |
en |
dc.subject.other |
Hierarchical Mixture of Experts system |
en |
dc.subject.other |
Hierarchical systems of experts |
en |
dc.subject.other |
Nonlinear relationships |
en |
dc.subject.other |
Spam mail |
en |
dc.subject.other |
Hierarchical systems |
en |
dc.title |
SF-HME system: A hierarchical mixtures-of-experts classification system for spam filtering |
en |
heal.type |
conferenceItem |
en |
heal.identifier.primary |
10.1145/1141277.1141360 |
en |
heal.identifier.secondary |
http://dx.doi.org/10.1145/1141277.1141360 |
en |
heal.publicationDate |
2006 |
en |
heal.abstract |
Many linear statistical models have been lately proposed in text classification related literature and evaluated against the Unsolicited Bulk Email filtering problem. Despite their popularity - due both to their simplicity and relative ease of interpretation -the non-linearity assumption of data samples is inappropriate in practice, due to its inability to capture the apparent non-linear relationships, which characterize these samples. In this paper, we propose the SF-HME, a Hierarchical Mixture-of-Experts system, attempting to overcome limitations common to other machinelearning based approaches when applied to spam mail classification. By reducing the dimensionality of data through the usage of the effective Simba algorithm for feature selection, we evaluated our SF-HME system with a publicly available corpus of emails, with very high similarity between legitimate and bulk email - and thus low discriminative potential - where the traditional rule based filtering approaches achieve considerable lower degrees of precision. As a result, we confirm the domination of our SF-HME method against other machine learning approaches, which appeared to present lesser degree of recall. Copyright 2006 ACM. |
en |
heal.journalName |
Proceedings of the ACM Symposium on Applied Computing |
en |
dc.identifier.doi |
10.1145/1141277.1141360 |
en |
dc.identifier.volume |
1 |
en |
dc.identifier.spage |
354 |
en |
dc.identifier.epage |
360 |
en |