File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1007/978-3-030-58558-7_31
- Scopus: eid_2-s2.0-85097379613
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: Few-Shot Action Recognition with Permutation-Invariant Attention
Title | Few-Shot Action Recognition with Permutation-Invariant Attention |
---|---|
Authors | |
Issue Date | 2020 |
Publisher | Springer. |
Citation | Proceedings of the 16th European Conference on Computer Vision (ECCV), Online, Glasgow, UK, 23-28 August 2020, pt V, p. 525-542 How to Cite? |
Abstract | Many few-shot learning models focus on recognising images. In contrast, we tackle a challenging task of few-shot action recognition from videos. We build on a C3D encoder for spatio-temporal video blocks to capture short-range action patterns. Such encoded blocks are aggregated by permutation-invariant pooling to make our approach robust to varying action lengths and long-range temporal dependencies whose patterns are unlikely to repeat even in clips of the same class. Subsequently, the pooled representations are combined into simple relation descriptors which encode so-called query and support clips. Finally, relation descriptors are fed to the comparator with the goal of similarity learning between query and support clips. Importantly, to re-weight block contributions during pooling, we exploit spatial and temporal attention modules and self-supervision. In naturalistic clips (of the same class) there exists a temporal distribution shift–the locations of discriminative temporal action hotspots vary. Thus, we permute blocks of a clip and align the resulting attention regions with similarly permuted attention regions of non-permuted clip to train the attention mechanism invariant to block (and thus long-term hotspot) permutations. Our method outperforms the state of the art on the HMDB51, UCF101, miniMIT datasets. |
Persistent Identifier | http://hdl.handle.net/10722/294711 |
ISBN | |
Series/Report no. | Lecture Notes in Computer Science (LNCS); v. 12350 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zhang, H | - |
dc.contributor.author | Zhang, L | - |
dc.contributor.author | Qi, X | - |
dc.contributor.author | Li, H | - |
dc.contributor.author | Torr, PHS | - |
dc.contributor.author | Koniusz, P | - |
dc.date.accessioned | 2020-12-08T07:40:46Z | - |
dc.date.available | 2020-12-08T07:40:46Z | - |
dc.date.issued | 2020 | - |
dc.identifier.citation | Proceedings of the 16th European Conference on Computer Vision (ECCV), Online, Glasgow, UK, 23-28 August 2020, pt V, p. 525-542 | - |
dc.identifier.isbn | 9783030585570 | - |
dc.identifier.uri | http://hdl.handle.net/10722/294711 | - |
dc.description.abstract | Many few-shot learning models focus on recognising images. In contrast, we tackle a challenging task of few-shot action recognition from videos. We build on a C3D encoder for spatio-temporal video blocks to capture short-range action patterns. Such encoded blocks are aggregated by permutation-invariant pooling to make our approach robust to varying action lengths and long-range temporal dependencies whose patterns are unlikely to repeat even in clips of the same class. Subsequently, the pooled representations are combined into simple relation descriptors which encode so-called query and support clips. Finally, relation descriptors are fed to the comparator with the goal of similarity learning between query and support clips. Importantly, to re-weight block contributions during pooling, we exploit spatial and temporal attention modules and self-supervision. In naturalistic clips (of the same class) there exists a temporal distribution shift–the locations of discriminative temporal action hotspots vary. Thus, we permute blocks of a clip and align the resulting attention regions with similarly permuted attention regions of non-permuted clip to train the attention mechanism invariant to block (and thus long-term hotspot) permutations. Our method outperforms the state of the art on the HMDB51, UCF101, miniMIT datasets. | - |
dc.language | eng | - |
dc.publisher | Springer. | - |
dc.relation.ispartof | European Conference on Computer Vision (ECCV) 2020 | - |
dc.relation.ispartofseries | Lecture Notes in Computer Science (LNCS); v. 12350 | - |
dc.title | Few-Shot Action Recognition with Permutation-Invariant Attention | - |
dc.type | Conference_Paper | - |
dc.identifier.email | Qi, X: xjqi@eee.hku.hk | - |
dc.identifier.authority | Qi, X=rp02666 | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1007/978-3-030-58558-7_31 | - |
dc.identifier.scopus | eid_2-s2.0-85097379613 | - |
dc.identifier.hkuros | 320336 | - |
dc.identifier.volume | pt V | - |
dc.identifier.spage | 525 | - |
dc.identifier.epage | 542 | - |
dc.publisher.place | Cham | - |