dc.contributor.author |
Evangelopoulos, G |
en |
dc.contributor.author |
Zlatintsi, A |
en |
dc.contributor.author |
Skoumas, G |
en |
dc.contributor.author |
Rapantzikos, K |
en |
dc.contributor.author |
Potamianos, A |
en |
dc.contributor.author |
Maragos, P |
en |
dc.contributor.author |
Avrithis, Y |
en |
dc.date.accessioned |
2014-03-01T02:46:34Z |
|
dc.date.available |
2014-03-01T02:46:34Z |
|
dc.date.issued |
2009 |
en |
dc.identifier.issn |
15206149 |
en |
dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/32724 |
|
dc.subject |
Audio |
en |
dc.subject |
Movie summarization |
en |
dc.subject |
Multimodal saliency |
en |
dc.subject |
Text processing |
en |
dc.subject |
Video |
en |
dc.subject |
Video abstraction |
en |
dc.subject.other |
Audio |
en |
dc.subject.other |
Movie summarization |
en |
dc.subject.other |
Multimodal saliency |
en |
dc.subject.other |
Video |
en |
dc.subject.other |
Video abstraction |
en |
dc.subject.other |
Abstracting |
en |
dc.subject.other |
Acoustics |
en |
dc.subject.other |
Embedded systems |
en |
dc.subject.other |
Mathematical operators |
en |
dc.subject.other |
Motion pictures |
en |
dc.subject.other |
Signal processing |
en |
dc.subject.other |
Text processing |
en |
dc.subject.other |
Video recording |
en |
dc.subject.other |
Word processing |
en |
dc.subject.other |
Signal detection |
en |
dc.title |
Video event detection and summarization using audio, visual and text saliency |
en |
heal.type |
conferenceItem |
en |
heal.identifier.primary |
10.1109/ICASSP.2009.4960393 |
en |
heal.identifier.secondary |
http://dx.doi.org/10.1109/ICASSP.2009.4960393 |
en |
heal.identifier.secondary |
4960393 |
en |
heal.publicationDate |
2009 |
en |
heal.abstract |
Detection of perceptually important video events is formulated here on the basis of saliency models for the audio, visual and textual information conveyed in a video stream. Audio saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking. Visual saliency is measured through a spatiotemporal attention model driven by intensity, color and motion. Text saliency is extracted from part-of-speech tagging on the subtitles information available with most movie distributions. The various modality curves are integrated in a single attention curve, where the presence of an event may be signified in one or multiple domains. This multimodal saliency curve is the basis of a bottom-up video summarization algorithm, that refines results from unimodal or audiovisual-based skimming. The algorithm performs favorably for video summarization in terms of informativeness and enjoyability. ©2009 IEEE. |
en |
heal.journalName |
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
en |
dc.identifier.doi |
10.1109/ICASSP.2009.4960393 |
en |
dc.identifier.spage |
3553 |
en |
dc.identifier.epage |
3556 |
en |