Visual event recognition in news video using kernel methods with multi-level temporal alignment

Xu, Dong; Chang, Shih Fu

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/CVPR.2007.383226
Scopus: eid_2-s2.0-34948823856
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Visual event recognition in news video using kernel methods with multi-level temporal alignment

Title	Visual event recognition in news video using kernel methods with multi-level temporal alignment
Authors	Xu, Dong Chang, Shih Fu
Issue Date	2007
Citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2007, article no. 4270251 How to Cite? DOI: http://dx.doi.org/10.1109/CVPR.2007.383226
Abstract	In this work, we systematically study the problem of visual event recognition in unconstrained news video sequences. We adopt the discriminative kernel-based method for which video clip similarity plays an important role. First, we represent a video clip as a bag of orderless descriptors extracted from all of the constituent frames and apply Earth Mover's Distance (EMD) to integrate similarities among frames from two clips. Observing that a video clip is usually comprised of multiple sub-clips corresponding to event evolution over time, we further build a multilevel temporal pyramid. At each pyramid level, we integrate the information from different sub-clips with Integer-value- constrained EMD to explicitly align the sub-clips. By fusing the information from the different pyramid levels, we develop Temporally Aligned Pyramid Matching (TAPM) for measuring video similarity. We conduct comprehensive experiments on the Trecvid 2005 corpus, which contains more than 6,800 clips. Our experiments demonstrate that 1) the TAPM multi-level method clearly outperforms single-level EMD, and 2) single-level EMD outperforms by a large margin (43.0% in Mean Average Precision) basic detection methods that use only a single key-frame. Extensive analysis of the results also reveals an intuitive interpretation of subclip alignment at different levels. © 2007 IEEE.
Persistent Identifier	http://hdl.handle.net/10722/321323
ISSN	1063-6919 2023 SCImago Journal Rankings: 10.331

DC Field	Value	Language
dc.contributor.author	Xu, Dong	-
dc.contributor.author	Chang, Shih Fu	-
dc.date.accessioned	2022-11-03T02:18:09Z	-
dc.date.available	2022-11-03T02:18:09Z	-
dc.date.issued	2007	-
dc.identifier.citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2007, article no. 4270251	-
dc.identifier.issn	1063-6919	-
dc.identifier.uri	http://hdl.handle.net/10722/321323	-
dc.description.abstract	In this work, we systematically study the problem of visual event recognition in unconstrained news video sequences. We adopt the discriminative kernel-based method for which video clip similarity plays an important role. First, we represent a video clip as a bag of orderless descriptors extracted from all of the constituent frames and apply Earth Mover's Distance (EMD) to integrate similarities among frames from two clips. Observing that a video clip is usually comprised of multiple sub-clips corresponding to event evolution over time, we further build a multilevel temporal pyramid. At each pyramid level, we integrate the information from different sub-clips with Integer-value- constrained EMD to explicitly align the sub-clips. By fusing the information from the different pyramid levels, we develop Temporally Aligned Pyramid Matching (TAPM) for measuring video similarity. We conduct comprehensive experiments on the Trecvid 2005 corpus, which contains more than 6,800 clips. Our experiments demonstrate that 1) the TAPM multi-level method clearly outperforms single-level EMD, and 2) single-level EMD outperforms by a large margin (43.0% in Mean Average Precision) basic detection methods that use only a single key-frame. Extensive analysis of the results also reveals an intuitive interpretation of subclip alignment at different levels. © 2007 IEEE.	-
dc.language	eng	-
dc.relation.ispartof	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition	-
dc.title	Visual event recognition in news video using kernel methods with multi-level temporal alignment	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/CVPR.2007.383226	-
dc.identifier.scopus	eid_2-s2.0-34948823856	-
dc.identifier.spage	article no. 4270251	-
dc.identifier.epage	article no. 4270251	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Visual event recognition in news video using kernel methods with multi-level temporal alignment

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats