An N-gram based distributional test for authorship identification

DSpace/Manakin Repository

Show simple item record

dc.contributor.author Fragos, K en
dc.contributor.author Skourlas, C en
dc.date.accessioned 2014-03-01T02:43:55Z
dc.date.available 2014-03-01T02:43:55Z
dc.date.issued 2006 en
dc.identifier.uri http://hdl.handle.net/123456789/31553
dc.relation.uri http://www.scopus.com/inward/record.url?eid=2-s2.0-77954117086&partnerID=40&md5=42520e935fd87ae59fef7c8180b74056 en
dc.relation.uri http://www.informatik.uni-trier.de/~ley/db/conf/nlucs/nlucs2006.html#FragosS06a en
dc.subject.other Authorship attribution en
dc.subject.other Authorship identification en
dc.subject.other Character level en
dc.subject.other Feature selection en
dc.subject.other Kolmogorov-Smirnov test en
dc.subject.other N-grams en
dc.subject.other NLP tools en
dc.subject.other Novel methods en
dc.subject.other Test sets en
dc.subject.other Text collection en
dc.subject.other Text preprocessing en
dc.subject.other Text segmentation en
dc.subject.other Training data en
dc.subject.other Competition en
dc.subject.other Feature extraction en
dc.subject.other Query languages en
dc.subject.other Linguistics en
dc.title An N-gram based distributional test for authorship identification en
heal.type conferenceItem en
heal.publicationDate 2006 en
heal.abstract In this paper, a novel method for the authorship identification problem is presented. Based on character level text segmentation we study the disputed text's N-grams distributions within the authors' text collections. The distribution that behaves most abnormally is identified using the Kolmogorov -Smirnov test and the corresponding Author is selected as the correct one. Our method is evaluated using the test sets of the 2004 ALLC/ACH Ad-hoc Authorship Attribution Competition and its performance is comparable with the best performances of the participants in the competition. The main advantage of our method is that it is a simple, not parametric way for authorship attribution without the necessity of building authors' profiles from training data. Moreover, the method is language independent and does not require segmentation for languages such as Chinese or Thai. There is also no need for any text preprocessing or higher level processing, avoiding thus the use of taggers, parsers, feature selection strategies, or the use of other language dependent NLP tools. en
heal.journalName Proceedings of the 3rd International Workshop on Natural Language Understanding and Cognitive Science, NLUCS 2006, in Conjunction with ICEIS 2006 en
dc.identifier.spage 139 en
dc.identifier.epage 148 en

Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record