File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Visual event recognition in videos by learning from web data

TitleVisual event recognition in videos by learning from web data
Authors
Issue Date2010
Citation
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, p. 1959-1966 How to Cite?
AbstractWe propose a visual event recognition framework for consumer domain videos by leveraging a large amount of loosely labeled web videos (e.g., from YouTube). First, we propose a new aligned space-time pyramid matching method to measure the distances between two video clips, where each video clip is divided into space-time volumes over multiple levels. We calculate the pair-wise distances between any two volumes and further integrate the information from different volumes with Integer-flow Earth Mover's Distance (EMD) to explicitly align the volumes. Second, we propose a new cross-domain learning method in order to 1) fuse the information from multiple pyramid levels and features (i.e., space-time feature and static SIFT feature) and 2) cope with the considerable variation in feature distributions between videos from two domains (i.e., web domain and consumer domain). For each pyramid level and each type of local features, we train a set of SVM classifiers based on the combined training set from two domains using multiple base kernels of different kernel types and parameters, which are fused with equal weights to obtain an average classifier. Finally, we propose a cross-domain learning method, referred to as Adaptive Multiple Kernel Learning (A-MKL), to learn an adapted classifier based on multiple base kernels and the prelearned average classifiers by minimizing both the structural risk functional and the mismatch between data distributions from two domains. Extensive experiments demonstrate the effectiveness of our proposed framework that requires only a small number of labeled consumer videos by leveraging web data. ©2010 IEEE.
Persistent Identifierhttp://hdl.handle.net/10722/321229
ISSN
2020 SCImago Journal Rankings: 4.658
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorDuan, Lixin-
dc.contributor.authorXu, Dong-
dc.contributor.authorTsang, Ivor W.-
dc.contributor.authorLuo, Jiebo-
dc.date.accessioned2022-11-03T02:17:31Z-
dc.date.available2022-11-03T02:17:31Z-
dc.date.issued2010-
dc.identifier.citationProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, p. 1959-1966-
dc.identifier.issn1063-6919-
dc.identifier.urihttp://hdl.handle.net/10722/321229-
dc.description.abstractWe propose a visual event recognition framework for consumer domain videos by leveraging a large amount of loosely labeled web videos (e.g., from YouTube). First, we propose a new aligned space-time pyramid matching method to measure the distances between two video clips, where each video clip is divided into space-time volumes over multiple levels. We calculate the pair-wise distances between any two volumes and further integrate the information from different volumes with Integer-flow Earth Mover's Distance (EMD) to explicitly align the volumes. Second, we propose a new cross-domain learning method in order to 1) fuse the information from multiple pyramid levels and features (i.e., space-time feature and static SIFT feature) and 2) cope with the considerable variation in feature distributions between videos from two domains (i.e., web domain and consumer domain). For each pyramid level and each type of local features, we train a set of SVM classifiers based on the combined training set from two domains using multiple base kernels of different kernel types and parameters, which are fused with equal weights to obtain an average classifier. Finally, we propose a cross-domain learning method, referred to as Adaptive Multiple Kernel Learning (A-MKL), to learn an adapted classifier based on multiple base kernels and the prelearned average classifiers by minimizing both the structural risk functional and the mismatch between data distributions from two domains. Extensive experiments demonstrate the effectiveness of our proposed framework that requires only a small number of labeled consumer videos by leveraging web data. ©2010 IEEE.-
dc.languageeng-
dc.relation.ispartofProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition-
dc.titleVisual event recognition in videos by learning from web data-
dc.typeConference_Paper-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1109/CVPR.2010.5539870-
dc.identifier.scopuseid_2-s2.0-77956003629-
dc.identifier.spage1959-
dc.identifier.epage1966-
dc.identifier.isiWOS:000287417502002-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats