HEAL DSpace

Applying effective feature selection techniques with hierarchical mixtures of experts for spam classification

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Belsis, P en
dc.contributor.author Fragos, K en
dc.contributor.author Gritzalis, S en
dc.contributor.author Skourlas, C en
dc.date.accessioned 2014-03-01T01:27:56Z
dc.date.available 2014-03-01T01:27:56Z
dc.date.issued 2008 en
dc.identifier.issn 0926227X en
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/18648
dc.subject Hierarchical mixtures of experts en
dc.subject Machine learning based processing en
dc.subject Spam mail en
dc.subject.other Cost effectiveness en
dc.subject.other Data storage equipment en
dc.subject.other Electronic mail en
dc.subject.other Internet service providers en
dc.subject.other Learning systems en
dc.subject.other Mixtures en
dc.subject.other Random processes en
dc.subject.other Robot learning en
dc.subject.other Spamming en
dc.subject.other Cost efficiencies en
dc.subject.other Data dimensionalities en
dc.subject.other Data samples en
dc.subject.other Effective algorithms en
dc.subject.other Feature selections en
dc.subject.other Hierarchical mixtures en
dc.subject.other Hierarchical mixtures of experts en
dc.subject.other Internet services en
dc.subject.other Linear relationships en
dc.subject.other Machine learning based processing en
dc.subject.other Machine learnings en
dc.subject.other Massive quantities en
dc.subject.other Non linearities en
dc.subject.other Offensive languages en
dc.subject.other Perceptron en
dc.subject.other Spam classifications en
dc.subject.other Spam filtering en
dc.subject.other Spam mail en
dc.subject.other Statistical models en
dc.subject.other Storage resources en
dc.subject.other Text filtering en
dc.subject.other Unsolicited bulk e-mails en
dc.subject.other Internet en
dc.title Applying effective feature selection techniques with hierarchical mixtures of experts for spam classification en
heal.type journalArticle en
heal.identifier.primary 10.3233/JCS-2008-0319 en
heal.identifier.secondary http://dx.doi.org/10.3233/JCS-2008-0319 en
heal.publicationDate 2008 en
heal.abstract E-mail abuse has been steadily increasing during the last decade. E-mail users find themselves targeted by massive quantities of unsolicited bulk e-mail, which often contains offensive language or has fraudulent intentions. Internet Service Providers (ISPs) on the other hand, have to face a considerable system overloading as the incoming mail consumes network and storage resources. Among the plethora of solutions, the most prominent in terms of cost efficiency and complexity are the text filtering approaches. Most of the approaches model the problem using linear statistical models. Despite their popularity - due both to their simplicity and relative ease of interpretation - the non-linearity assumption of data samples is inappropriate in practice. This is mainly due to the inability of other approaches to capture the apparent non-linear relationships, which characterize these samples. In this paper, we propose a margin-based feature selection approach integrated with a Hierarchical Mixtures of Experts (HME) system, which attempts to overcome limitations common to other machine-learning based approaches. By reducing the data dimensionality using effective algorithms for feature selection we evaluated our system with publicly available corpora of e-mails, characterized by very high similarity between legitimate and bulk e-mail (and thus low discriminative potential). We experimented with two different architectures, a hierarchical HME and a perceptron HME. As a result, we confirm the domination of our Spam Filtering (SF) - HME method against other machine learning approaches, which present lesser degree of recall, as well as against traditional rule-based approaches, which lack considerably in the achieved degrees of precision. © 2008 - IOS Press and the authors. All rights reserved. en
heal.journalName Journal of Computer Security en
dc.identifier.doi 10.3233/JCS-2008-0319 en
dc.identifier.volume 16 en
dc.identifier.issue 6 en
dc.identifier.spage 761 en
dc.identifier.epage 790 en


Αρχεία σε αυτό το τεκμήριο

Αρχεία Μέγεθος Μορφότυπο Προβολή

Δεν υπάρχουν αρχεία που σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής