dc.contributor.author |
Kotti, M |
en |
dc.contributor.author |
Ververidis, D |
en |
dc.contributor.author |
Evangelopoulos, G |
en |
dc.contributor.author |
Panagakis, I |
en |
dc.contributor.author |
Kotropoulos, C |
en |
dc.contributor.author |
Maragos, P |
en |
dc.contributor.author |
Pitas, I |
en |
dc.date.accessioned |
2014-03-01T01:27:57Z |
|
dc.date.available |
2014-03-01T01:27:57Z |
|
dc.date.issued |
2008 |
en |
dc.identifier.issn |
1051-8215 |
en |
dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/18655 |
|
dc.subject |
Audio activity detection |
en |
dc.subject |
Cross-correlation |
en |
dc.subject |
Cross-power spectral density |
en |
dc.subject |
Dialogue detection |
en |
dc.subject |
Indicator functions |
en |
dc.subject |
Speaker clustering |
en |
dc.subject.classification |
Engineering, Electrical & Electronic |
en |
dc.subject.other |
Classifiers |
en |
dc.subject.other |
Feedforward neural networks |
en |
dc.subject.other |
Flow of solids |
en |
dc.subject.other |
Ketones |
en |
dc.subject.other |
Learning systems |
en |
dc.subject.other |
Power spectral density |
en |
dc.subject.other |
Radial basis function networks |
en |
dc.subject.other |
Reactor cores |
en |
dc.subject.other |
Support vector machines |
en |
dc.subject.other |
Audio activity detection |
en |
dc.subject.other |
Cross-correlation |
en |
dc.subject.other |
Cross-power spectral density |
en |
dc.subject.other |
Dialogue detection |
en |
dc.subject.other |
Indicator functions |
en |
dc.subject.other |
Speaker clustering |
en |
dc.subject.other |
Probability density function |
en |
dc.title |
Audio-assisted movie dialogue detection |
en |
heal.type |
journalArticle |
en |
heal.identifier.primary |
10.1109/TCSVT.2008.2005613 |
en |
heal.identifier.secondary |
http://dx.doi.org/10.1109/TCSVT.2008.2005613 |
en |
heal.identifier.secondary |
4630764 |
en |
heal.language |
English |
en |
heal.publicationDate |
2008 |
en |
heal.abstract |
An audio-assisted system is investigated that detects if a movie scene is a dialogue or not. The system is based on actor indicator functions. That is, functions which define if an actor speaks at a certain time instant. In particular, the cross-correlation and the magnitude of the corresponding the cross-power spectral density of a pair of indicator functions are input to various classifiers, such as voted perceptions, radial basis function networks, random trees, and support vector machines for dialogue/non-dialogue detection. To boost classifier efficiency AdaBoost is also exploited. The aforementioned classifiers are trained using ground truth indicator functions determined by human annotators for 41 dialogue and another 20 non-dialogue audio instances. For testing, actual indicator functions are derived by applying audio activity detection and actor clustering to audio recordings. 23 instances are randomly chosen among the aforementioned 41 dialogue instances, 17 of which correspond to dialogue scenes and 6 to non-dialogue ones. Accuracy ranging between 0.739 and 0.826 is reported. © 2008 IEEE. |
en |
heal.publisher |
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
en |
heal.journalName |
IEEE Transactions on Circuits and Systems for Video Technology |
en |
dc.identifier.doi |
10.1109/TCSVT.2008.2005613 |
en |
dc.identifier.isi |
ISI:000260867100015 |
en |
dc.identifier.volume |
18 |
en |
dc.identifier.issue |
11 |
en |
dc.identifier.spage |
1618 |
en |
dc.identifier.epage |
1627 |
en |