dc.contributor.author |
Fragos, K |
en |
dc.contributor.author |
Skourlas, C |
en |
dc.date.accessioned |
2014-03-01T02:43:55Z |
|
dc.date.available |
2014-03-01T02:43:55Z |
|
dc.date.issued |
2006 |
en |
dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/31553 |
|
dc.relation.uri |
http://www.scopus.com/inward/record.url?eid=2-s2.0-77954117086&partnerID=40&md5=42520e935fd87ae59fef7c8180b74056 |
en |
dc.relation.uri |
http://www.informatik.uni-trier.de/~ley/db/conf/nlucs/nlucs2006.html#FragosS06a |
en |
dc.subject.other |
Authorship attribution |
en |
dc.subject.other |
Authorship identification |
en |
dc.subject.other |
Character level |
en |
dc.subject.other |
Feature selection |
en |
dc.subject.other |
Kolmogorov-Smirnov test |
en |
dc.subject.other |
N-grams |
en |
dc.subject.other |
NLP tools |
en |
dc.subject.other |
Novel methods |
en |
dc.subject.other |
Test sets |
en |
dc.subject.other |
Text collection |
en |
dc.subject.other |
Text preprocessing |
en |
dc.subject.other |
Text segmentation |
en |
dc.subject.other |
Training data |
en |
dc.subject.other |
Competition |
en |
dc.subject.other |
Feature extraction |
en |
dc.subject.other |
Query languages |
en |
dc.subject.other |
Linguistics |
en |
dc.title |
An N-gram based distributional test for authorship identification |
en |
heal.type |
conferenceItem |
en |
heal.publicationDate |
2006 |
en |
heal.abstract |
In this paper, a novel method for the authorship identification problem is presented. Based on character level text segmentation we study the disputed text's N-grams distributions within the authors' text collections. The distribution that behaves most abnormally is identified using the Kolmogorov -Smirnov test and the corresponding Author is selected as the correct one. Our method is evaluated using the test sets of the 2004 ALLC/ACH Ad-hoc Authorship Attribution Competition and its performance is comparable with the best performances of the participants in the competition. The main advantage of our method is that it is a simple, not parametric way for authorship attribution without the necessity of building authors' profiles from training data. Moreover, the method is language independent and does not require segmentation for languages such as Chinese or Thai. There is also no need for any text preprocessing or higher level processing, avoiding thus the use of taggers, parsers, feature selection strategies, or the use of other language dependent NLP tools. |
en |
heal.journalName |
Proceedings of the 3rd International Workshop on Natural Language Understanding and Cognitive Science, NLUCS 2006, in Conjunction with ICEIS 2006 |
en |
dc.identifier.spage |
139 |
en |
dc.identifier.epage |
148 |
en |