Content vs. context for sentiment analysis: A comparative analysis over microblogs

Aisopos, F; Papadakis, G; Tserpes, K; Varvarigou, T

dc.contributor.author	Aisopos, F	en
dc.contributor.author	Papadakis, G	en
dc.contributor.author	Tserpes, K	en
dc.contributor.author	Varvarigou, T	en
dc.date.accessioned	2014-03-01T02:53:35Z
dc.date.available	2014-03-01T02:53:35Z
dc.date.issued	2012	en
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/36434
dc.subject	N-gram graphs	en
dc.subject	Sentiment analysis	en
dc.subject	Social context	en
dc.subject.other	Classification methods	en
dc.subject.other	Comparative analysis	en
dc.subject.other	Content-based features	en
dc.subject.other	Context-based	en
dc.subject.other	Dimensionality reduction	en
dc.subject.other	Discretizations	en
dc.subject.other	Extraction costs	en
dc.subject.other	Inherent characteristics	en
dc.subject.other	Micro-blog	en
dc.subject.other	Multiple Classification	en
dc.subject.other	N-gram graphs	en
dc.subject.other	Noise-Tolerant	en
dc.subject.other	Real world data	en
dc.subject.other	Sentiment analysis	en
dc.subject.other	Social context	en
dc.subject.other	Time efficiencies	en
dc.subject.other	Traditional techniques	en
dc.subject.other	Hypertext systems	en
dc.subject.other	Virtual reality	en
dc.subject.other	Data mining	en
dc.title	Content vs. context for sentiment analysis: A comparative analysis over microblogs	en
heal.type	conferenceItem	en
heal.identifier.primary	10.1145/2309996.2310028	en
heal.identifier.secondary	http://dx.doi.org/10.1145/2309996.2310028	en
heal.publicationDate	2012	en
heal.abstract	Microblog content poses serious challenges to the applicability of traditional sentiment analysis and classification methods, due to its inherent characteristics. To tackle them, we introduce a method that relies on two orthogonal, but complementary sources of evidence: content-based features captured by n-gram graphs and context-based ones captured by polarity ratio. Both are language-neutral and noise-tolerant, guaranteeing high effectiveness and robustness in the settings we are considering. To ensure our approach can be integrated into practical applications with large volumes of data, we also aim at enhancing its time efficiency: we propose alternative sets of features with low extraction cost, explore dimensionality reduction and discretization techniques and experiment with multiple classification algorithms. We then evaluate our methods over a large, real-world data set extracted from Twitter, with the outcomes indicating significant improvements over the traditional techniques. Copyright 2012 ACM.	en
heal.journalName	HT'12 - Proceedings of 23rd ACM Conference on Hypertext and Social Media	en
dc.identifier.doi	10.1145/2309996.2310028	en
dc.identifier.spage	187	en
dc.identifier.epage	196	en