HEAL DSpace

Audio-assisted movie dialogue detection

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Kotti, M en
dc.contributor.author Ververidis, D en
dc.contributor.author Evangelopoulos, G en
dc.contributor.author Panagakis, I en
dc.contributor.author Kotropoulos, C en
dc.contributor.author Maragos, P en
dc.contributor.author Pitas, I en
dc.date.accessioned 2014-03-01T01:27:57Z
dc.date.available 2014-03-01T01:27:57Z
dc.date.issued 2008 en
dc.identifier.issn 1051-8215 en
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/18655
dc.subject Audio activity detection en
dc.subject Cross-correlation en
dc.subject Cross-power spectral density en
dc.subject Dialogue detection en
dc.subject Indicator functions en
dc.subject Speaker clustering en
dc.subject.classification Engineering, Electrical & Electronic en
dc.subject.other Classifiers en
dc.subject.other Feedforward neural networks en
dc.subject.other Flow of solids en
dc.subject.other Ketones en
dc.subject.other Learning systems en
dc.subject.other Power spectral density en
dc.subject.other Radial basis function networks en
dc.subject.other Reactor cores en
dc.subject.other Support vector machines en
dc.subject.other Audio activity detection en
dc.subject.other Cross-correlation en
dc.subject.other Cross-power spectral density en
dc.subject.other Dialogue detection en
dc.subject.other Indicator functions en
dc.subject.other Speaker clustering en
dc.subject.other Probability density function en
dc.title Audio-assisted movie dialogue detection en
heal.type journalArticle en
heal.identifier.primary 10.1109/TCSVT.2008.2005613 en
heal.identifier.secondary http://dx.doi.org/10.1109/TCSVT.2008.2005613 en
heal.identifier.secondary 4630764 en
heal.language English en
heal.publicationDate 2008 en
heal.abstract An audio-assisted system is investigated that detects if a movie scene is a dialogue or not. The system is based on actor indicator functions. That is, functions which define if an actor speaks at a certain time instant. In particular, the cross-correlation and the magnitude of the corresponding the cross-power spectral density of a pair of indicator functions are input to various classifiers, such as voted perceptions, radial basis function networks, random trees, and support vector machines for dialogue/non-dialogue detection. To boost classifier efficiency AdaBoost is also exploited. The aforementioned classifiers are trained using ground truth indicator functions determined by human annotators for 41 dialogue and another 20 non-dialogue audio instances. For testing, actual indicator functions are derived by applying audio activity detection and actor clustering to audio recordings. 23 instances are randomly chosen among the aforementioned 41 dialogue instances, 17 of which correspond to dialogue scenes and 6 to non-dialogue ones. Accuracy ranging between 0.739 and 0.826 is reported. © 2008 IEEE. en
heal.publisher IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC en
heal.journalName IEEE Transactions on Circuits and Systems for Video Technology en
dc.identifier.doi 10.1109/TCSVT.2008.2005613 en
dc.identifier.isi ISI:000260867100015 en
dc.identifier.volume 18 en
dc.identifier.issue 11 en
dc.identifier.spage 1618 en
dc.identifier.epage 1627 en


Αρχεία σε αυτό το τεκμήριο

Αρχεία Μέγεθος Μορφότυπο Προβολή

Δεν υπάρχουν αρχεία που σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής