File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/TPAMI.2024.3401409
- Scopus: eid_2-s2.0-85193287200
- Find via
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Article: Object-centric Representation Learning for Video Scene Understanding
Title | Object-centric Representation Learning for Video Scene Understanding |
---|---|
Authors | |
Keywords | Depth estimation Estimation Feature extraction Generators IP networks object-centric representation Pipelines scene understanding Semantics Task analysis tracking video panoptic segmentation |
Issue Date | 15-May-2024 |
Publisher | Institute of Electrical and Electronics Engineers |
Citation | IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, p. 1-13 How to Cite? |
Abstract | Depth-aware Video Panoptic Segmentation (DVPS) is a challenging task that requires predicting the semantic class and 3D depth of each pixel in a video, while also segmenting and consistently tracking objects across frames. Predominant methodologies treat this as a multi-task learning problem, tackling each constituent task independently, thus restricting their capacity to leverage interrelationships amongst tasks and requiring parameter tuning for each task. To surmount these constraints, we present Slot-IVPS, a new approach employing an object-centric model to acquire unified object representations, thereby facilitating the model's ability to simultaneously capture semantic and depth information. Specifically, we introduce a novel representation, Integrated Panoptic Slots (IPS), to capture both semantic and depth information for all panoptic objects within a video, encompassing background semantics and foreground instances. Subsequently, we propose an integrated feature generator and enhancer to extract depth-aware features, alongside the Integrated Video Panoptic Retriever (IVPR), which iteratively retrieves spatial-temporal coherent object features and encodes them into IPS. The resulting IPS can be effortlessly decoded into an array of video outputs, including depth maps, classifications, masks, and object instance IDs. We undertake comprehensive analyses across four datasets, attaining state-of-the-art performance in both Depth-aware Video Panoptic Segmentation and Video Panoptic Segmentation tasks. Codes will be available at https://github.com/SAITPublic/. |
Persistent Identifier | http://hdl.handle.net/10722/350740 |
ISSN | 2023 Impact Factor: 20.8 2023 SCImago Journal Rankings: 6.158 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zhou, Yi | - |
dc.contributor.author | Zhang, Hui | - |
dc.contributor.author | Park, Seung In | - |
dc.contributor.author | Yoo, Byung In | - |
dc.contributor.author | Qi, Xiaojuan | - |
dc.date.accessioned | 2024-11-02T00:36:48Z | - |
dc.date.available | 2024-11-02T00:36:48Z | - |
dc.date.issued | 2024-05-15 | - |
dc.identifier.citation | IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, p. 1-13 | - |
dc.identifier.issn | 0162-8828 | - |
dc.identifier.uri | http://hdl.handle.net/10722/350740 | - |
dc.description.abstract | <p>Depth-aware Video Panoptic Segmentation (DVPS) is a challenging task that requires predicting the semantic class and 3D depth of each pixel in a video, while also segmenting and consistently tracking objects across frames. Predominant methodologies treat this as a multi-task learning problem, tackling each constituent task independently, thus restricting their capacity to leverage interrelationships amongst tasks and requiring parameter tuning for each task. To surmount these constraints, we present Slot-IVPS, a new approach employing an object-centric model to acquire unified object representations, thereby facilitating the model's ability to simultaneously capture semantic and depth information. Specifically, we introduce a novel representation, Integrated Panoptic Slots (IPS), to capture both semantic and depth information for all panoptic objects within a video, encompassing background semantics and foreground instances. Subsequently, we propose an integrated feature generator and enhancer to extract depth-aware features, alongside the Integrated Video Panoptic Retriever (IVPR), which iteratively retrieves spatial-temporal coherent object features and encodes them into IPS. The resulting IPS can be effortlessly decoded into an array of video outputs, including depth maps, classifications, masks, and object instance IDs. We undertake comprehensive analyses across four datasets, attaining state-of-the-art performance in both Depth-aware Video Panoptic Segmentation and Video Panoptic Segmentation tasks. Codes will be available at https://github.com/SAITPublic/.</p> | - |
dc.language | eng | - |
dc.publisher | Institute of Electrical and Electronics Engineers | - |
dc.relation.ispartof | IEEE Transactions on Pattern Analysis and Machine Intelligence | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject | Depth estimation | - |
dc.subject | Estimation | - |
dc.subject | Feature extraction | - |
dc.subject | Generators | - |
dc.subject | IP networks | - |
dc.subject | object-centric representation | - |
dc.subject | Pipelines | - |
dc.subject | scene understanding | - |
dc.subject | Semantics | - |
dc.subject | Task analysis | - |
dc.subject | tracking | - |
dc.subject | video panoptic segmentation | - |
dc.title | Object-centric Representation Learning for Video Scene Understanding | - |
dc.type | Article | - |
dc.identifier.doi | 10.1109/TPAMI.2024.3401409 | - |
dc.identifier.scopus | eid_2-s2.0-85193287200 | - |
dc.identifier.spage | 1 | - |
dc.identifier.epage | 13 | - |
dc.identifier.eissn | 1939-3539 | - |
dc.identifier.issnl | 0162-8828 | - |