Audio-assisted movie dialogue detection

Kotti, M; Ververidis, D; Evangelopoulos, G; Panagakis, I; Kotropoulos, C; Maragos, P; Pitas, I

dc.contributor.author	Kotti, M	en
dc.contributor.author	Ververidis, D	en
dc.contributor.author	Evangelopoulos, G	en
dc.contributor.author	Panagakis, I	en
dc.contributor.author	Kotropoulos, C	en
dc.contributor.author	Maragos, P	en
dc.contributor.author	Pitas, I	en
dc.date.accessioned	2014-03-01T01:27:57Z
dc.date.available	2014-03-01T01:27:57Z
dc.date.issued	2008	en
dc.identifier.issn	1051-8215	en
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/18655
dc.subject	Audio activity detection	en
dc.subject	Cross-correlation	en
dc.subject	Cross-power spectral density	en
dc.subject	Dialogue detection	en
dc.subject	Indicator functions	en
dc.subject	Speaker clustering	en
dc.subject.classification	Engineering, Electrical & Electronic	en
dc.subject.other	Classifiers	en
dc.subject.other	Feedforward neural networks	en
dc.subject.other	Flow of solids	en
dc.subject.other	Ketones	en
dc.subject.other	Learning systems	en
dc.subject.other	Power spectral density	en
dc.subject.other	Radial basis function networks	en
dc.subject.other	Reactor cores	en
dc.subject.other	Support vector machines	en
dc.subject.other	Audio activity detection	en
dc.subject.other	Cross-correlation	en
dc.subject.other	Cross-power spectral density	en
dc.subject.other	Dialogue detection	en
dc.subject.other	Indicator functions	en
dc.subject.other	Speaker clustering	en
dc.subject.other	Probability density function	en
dc.title	Audio-assisted movie dialogue detection	en
heal.type	journalArticle	en
heal.identifier.primary	10.1109/TCSVT.2008.2005613	en
heal.identifier.secondary	http://dx.doi.org/10.1109/TCSVT.2008.2005613	en
heal.identifier.secondary	4630764	en
heal.language	English	en
heal.publicationDate	2008	en
heal.abstract	An audio-assisted system is investigated that detects if a movie scene is a dialogue or not. The system is based on actor indicator functions. That is, functions which define if an actor speaks at a certain time instant. In particular, the cross-correlation and the magnitude of the corresponding the cross-power spectral density of a pair of indicator functions are input to various classifiers, such as voted perceptions, radial basis function networks, random trees, and support vector machines for dialogue/non-dialogue detection. To boost classifier efficiency AdaBoost is also exploited. The aforementioned classifiers are trained using ground truth indicator functions determined by human annotators for 41 dialogue and another 20 non-dialogue audio instances. For testing, actual indicator functions are derived by applying audio activity detection and actor clustering to audio recordings. 23 instances are randomly chosen among the aforementioned 41 dialogue instances, 17 of which correspond to dialogue scenes and 6 to non-dialogue ones. Accuracy ranging between 0.739 and 0.826 is reported. © 2008 IEEE.	en
heal.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC	en
heal.journalName	IEEE Transactions on Circuits and Systems for Video Technology	en
dc.identifier.doi	10.1109/TCSVT.2008.2005613	en
dc.identifier.isi	ISI:000260867100015	en
dc.identifier.volume	18	en
dc.identifier.issue	11	en
dc.identifier.spage	1618	en
dc.identifier.epage	1627	en