Optimal content-based video decomposition for interactive video navigation

Doulamis, AD; Doulamis, ND

dc.contributor.author	Doulamis, AD	en
dc.contributor.author	Doulamis, ND	en
dc.date.accessioned	2014-03-01T01:21:11Z
dc.date.available	2014-03-01T01:21:11Z
dc.date.issued	2004	en
dc.identifier.issn	1051-8215	en
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/16124
dc.subject	21	en
dc.subject	Hierarchical summarization	en
dc.subject	MPEG-7	en
dc.subject	Video decomposition	en
dc.subject.classification	Engineering, Electrical & Electronic	en
dc.subject.other	Bandwidth	en
dc.subject.other	Correlation methods	en
dc.subject.other	Genetic algorithms	en
dc.subject.other	Internet	en
dc.subject.other	Multimedia systems	en
dc.subject.other	Navigation	en
dc.subject.other	Probability	en
dc.subject.other	Random processes	en
dc.subject.other	Trees (mathematics)	en
dc.subject.other	Vectors	en
dc.subject.other	Genetic algorithms (GA)	en
dc.subject.other	Hierarchical summarization	en
dc.subject.other	MPEG-7	en
dc.subject.other	Video decomposition	en
dc.subject.other	Video signal processing	en
dc.title	Optimal content-based video decomposition for interactive video navigation	en
heal.type	journalArticle	en
heal.identifier.primary	10.1109/TCSVT.2004.828348	en
heal.identifier.secondary	http://dx.doi.org/10.1109/TCSVT.2004.828348	en
heal.language	English	en
heal.publicationDate	2004	en
heal.abstract	In this paper, an interactive framework for navigating video sequences is presented using an optimal content-based video decomposition scheme. In particular, each video sequence is analyzed at different content resolution levels, creating a hierarchy from the lowest (coarse) to the highest (fine) resolution. This content hierarchy is represented as a tree structure, each level of which corresponds to a particular content resolution, while the tree nodes indicate the temporal video segments that the sequence content is partitioned at a given resolution. A criterion is introduced to measure the efficiency of the proposed scheme in organizing the video visual content and to compare it with other hierarchical video content representations and navigation schemes. The efficiency is measured as the difficulty for a user to locate a video segment of interest, while moving through different levels of hierarchy. In our case, video is decomposed so that the best efficiency is accomplished. However, the efficiency of a nonlinear video decomposition scheme depends on: 1) the number of paths required for a user to locate a relevant video segment and 2) the number of shot/frame classes (i.e., content representatives) extracted to represent the visual content. Both issues are addressed in this paper. In the first case, the probability of selecting a relevant video segment in the first path is maximized by extracting optimal content representatives through a minimization of a cross-correlation criterion. For the minimization, a genetic algorithm (GA) is adopted, since application of an exhaustive search to obtain the minimum value is too large to be implemented. The cross-correlation criterion is evaluated on the feature domain by extracting appropriate global and object-based descriptors for each video frame so that a better representation of the visual content is achieved. The second aspect (e.g., the number of content representatives) is addressed by minimizing the average transmitted information and simultaneously taking into consideration the temporal video segment complexity. More content representatives are extracted for video segments of high complexity, whereas a low number is required for low-complexity segments. In addition, a degree of interest is assigned to each video shot (or frame) to address the fact that, from the user's perception, the visual content of a set of shots (frames) satisfies his/her information needs. Finally, a computationally efficient algorithm is proposed to regulate the degree of detail (i.e., the number of shot/frames representatives) in case the visual content is not efficiently represented from the user's perceptive view. Experimental results on real-life video sequences indicate the performance of the proposed GA-based video decomposition scheme compared to other hierarchical video organization methods.	en
heal.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC	en
heal.journalName	IEEE Transactions on Circuits and Systems for Video Technology	en
dc.identifier.doi	10.1109/TCSVT.2004.828348	en
dc.identifier.isi	ISI:000221775600001	en
dc.identifier.volume	14	en
dc.identifier.issue	6	en
dc.identifier.spage	757	en
dc.identifier.epage	775	en