File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: GPT4Point++: Advancing Unified Point-Language Understanding and Generation

TitleGPT4Point++: Advancing Unified Point-Language Understanding and Generation
Authors
Keywords3D Multimodal Large Model
3D Object Generation
3D Object Recognition
Issue Date11-Aug-2025
PublisherInstitute of Electrical and Electronics Engineers
Citation
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, p. 1-16 How to Cite?
Abstract

Multimodal Large Language Models (MLLMs) have made significant progress in 2D image-text tasks, but the 3D domain remains challenging. To bridge this gap, we introduce GPT4Point and its enhanced version, GPT4Point++, both of which are pioneering point-language multimodal models designed for 3D object understanding and generation. They excel in tasks such as 3D object recognition, 3D point cloud captioning and question answering. Additionally, GPT4Point is equipped with advanced capabilities for controllable 3D generation, and it can get high-quality results through a low-quality point-text feature that maintains geometric shapes and colors. GPT4Point's training consists of two stages: first, aligning point-text features, followed by integrating the LLM. Our advanced version GPT4Point++ simplifies this with a single, unified end-to-end training approach for improved performance. To support the substantial demand for 3D object-text pairs, we have developed Capverse, a point-language dataset annotation engine. Capverse constructs a large-scale database with diverse levels of text granularity by leveraging the Objaverse dataset. We established a comprehensive benchmark to assess 3D point-language understanding. Extensive evaluations show that GPT4Point and GPT4Point++ excel in both understanding and generation tasks. Additionally, GPT4Point effectively evaluates 3D object generation methods and demonstrates strong understanding of both individual objects and indoor scenes, highlighting its robustness.


Persistent Identifierhttp://hdl.handle.net/10722/362388
ISSN
2023 Impact Factor: 20.8
2023 SCImago Journal Rankings: 6.158

 

DC FieldValueLanguage
dc.contributor.authorQI, Zhangyang-
dc.contributor.authorFANG, Ye-
dc.contributor.authorSUN, Zeyi-
dc.contributor.authorWU, Xiaoyang-
dc.contributor.authorWU, Tong-
dc.contributor.authorWANG, Jiaqi-
dc.contributor.authorLIN, Dahua-
dc.contributor.authorZHAO, Hengshuang-
dc.date.accessioned2025-09-23T00:31:10Z-
dc.date.available2025-09-23T00:31:10Z-
dc.date.issued2025-08-11-
dc.identifier.citationIEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, p. 1-16-
dc.identifier.issn0162-8828-
dc.identifier.urihttp://hdl.handle.net/10722/362388-
dc.description.abstract<p>Multimodal Large Language Models (MLLMs) have made significant progress in 2D image-text tasks, but the 3D domain remains challenging. To bridge this gap, we introduce GPT4Point and its enhanced version, GPT4Point++, both of which are pioneering point-language multimodal models designed for 3D object understanding and generation. They excel in tasks such as 3D object recognition, 3D point cloud captioning and question answering. Additionally, GPT4Point is equipped with advanced capabilities for controllable 3D generation, and it can get high-quality results through a low-quality point-text feature that maintains geometric shapes and colors. GPT4Point's training consists of two stages: first, aligning point-text features, followed by integrating the LLM. Our advanced version GPT4Point++ simplifies this with a single, unified end-to-end training approach for improved performance. To support the substantial demand for 3D object-text pairs, we have developed Capverse, a point-language dataset annotation engine. Capverse constructs a large-scale database with diverse levels of text granularity by leveraging the Objaverse dataset. We established a comprehensive benchmark to assess 3D point-language understanding. Extensive evaluations show that GPT4Point and GPT4Point++ excel in both understanding and generation tasks. Additionally, GPT4Point effectively evaluates 3D object generation methods and demonstrates strong understanding of both individual objects and indoor scenes, highlighting its robustness.<br></p>-
dc.languageeng-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.relation.ispartofIEEE Transactions on Pattern Analysis and Machine Intelligence-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject3D Multimodal Large Model-
dc.subject3D Object Generation-
dc.subject3D Object Recognition-
dc.titleGPT4Point++: Advancing Unified Point-Language Understanding and Generation-
dc.typeArticle-
dc.identifier.doi10.1109/TPAMI.2025.3597938-
dc.identifier.scopuseid_2-s2.0-105013279336-
dc.identifier.spage1-
dc.identifier.epage16-
dc.identifier.eissn1939-3539-
dc.identifier.issnl0162-8828-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats