HEAL DSpace

Applying effective feature selection techniques with hierarchical mixtures of experts for spam classification

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Belsis, P en
dc.contributor.author Fragos, K en
dc.contributor.author Gritzalis, S en
dc.contributor.author Skourlas, C en
dc.date.accessioned 2014-03-01T01:29:52Z
dc.date.available 2014-03-01T01:29:52Z
dc.date.issued 2009 en
dc.identifier.issn 0926227X en
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/19388
dc.subject Hierarchical mixtures of experts en
dc.subject Machine learning based processing en
dc.subject Spam mail en
dc.subject.other Cost efficiency en
dc.subject.other Data dimensionality en
dc.subject.other Data sample en
dc.subject.other E-mail abuse en
dc.subject.other Effective algorithms en
dc.subject.other Feature selection en
dc.subject.other Hierarchical mixtures en
dc.subject.other Hierarchical mixtures of experts en
dc.subject.other Machine learning based processing en
dc.subject.other Machine-learning en
dc.subject.other Massive quantities en
dc.subject.other Non-linear relationships en
dc.subject.other Non-Linearity en
dc.subject.other Offensive languages en
dc.subject.other Perceptron en
dc.subject.other Rule-based approach en
dc.subject.other Spam classification en
dc.subject.other Spam filtering en
dc.subject.other Spam mail en
dc.subject.other Statistical models en
dc.subject.other Storage resources en
dc.subject.other Text filtering en
dc.subject.other Unsolicited bulk e-mail en
dc.subject.other Education en
dc.subject.other Internet service providers en
dc.subject.other Mixtures en
dc.subject.other Robot learning en
dc.subject.other Spamming en
dc.subject.other Internet en
dc.title Applying effective feature selection techniques with hierarchical mixtures of experts for spam classification en
heal.type journalArticle en
heal.identifier.primary 10.3233/JCS-2009-0377 en
heal.identifier.secondary http://dx.doi.org/10.3233/JCS-2009-0377 en
heal.publicationDate 2009 en
heal.abstract E-mail abuse has been steadily increasing during the last decade. E-mail users find themselves targeted by massive quantities of unsolicited bulk e-mail, which often contains offensive language or has fraudulent intentions. Internet Service Providers (ISPs) on the other hand, have to face a considerable system overloading as the incoming mail consumes network and storage resources. Among the plethora of solutions, the most prominent in terms of cost efficiency and complexity are the text filtering approaches. Most of the approaches model the problem using linear statistical models. Despite their popularity - due both to their simplicity and relative ease of interpretation - the non-linearity assumption of data samples is inappropriate in practice. This is mainly due to the inability of other approaches to capture the apparent non-linear relationships, which characterize these samples. In this paper, we propose a margin-based feature selection approach integrated with a Hierarchical Mixtures of Experts (HME) system, which attempts to overcome limitations common to other machine-learning based approaches. By reducing the data dimensionality using effective algorithms for feature selection we evaluated our system with publicly available corpora of e-mails, characterized by very high similarity between legitimate and bulk e-mail (and thus low discriminative potential). We experimented with two different architectures, a hierarchical HME and a perceptron HME. As a result, we confirm the domination of our Spam Filtering (SF) - HME method against other machine learning approaches, which present lesser degree of recall, as well as against traditional rule-based approaches, which lack considerably in the achieved degrees of precision. © 2009 - IOS Press. en
heal.journalName Journal of Computer Security en
dc.identifier.doi 10.3233/JCS-2009-0377 en
dc.identifier.volume 17 en
dc.identifier.issue 3 en
dc.identifier.spage 239 en
dc.identifier.epage 268 en


Αρχεία σε αυτό το τεκμήριο

Αρχεία Μέγεθος Μορφότυπο Προβολή

Δεν υπάρχουν αρχεία που σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής