File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/TPAMI.2025.3531907
- Scopus: eid_2-s2.0-105003036178
- PMID: 40031217
- Find via

Supplementary
- Citations:
- Appears in Collections:
Article: T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-Image Generation
| Title | T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-Image Generation |
|---|---|
| Authors | |
| Keywords | Benchmark and evaluation compositional text-to-image generation image generation |
| Issue Date | 1-Jan-2025 |
| Publisher | Institute of Electrical and Electronics Engineers |
| Citation | IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, v. 47, n. 5, p. 3563-3579 How to Cite? |
| Abstract | Despite the impressive advances in text-to-image models, they often struggle to effectively compose complex scenes with multiple objects, displaying various attributes and relationships. To address this challenge, we present T2I-CompBench++, an enhanced benchmark for compositional text-to-image generation. T2I-CompBench++ comprises 8,000 compositional text prompts categorized into four primary groups: attribute binding, object relationships, generative numeracy, and complex compositions. These are further divided into eight sub-categories, including newly introduced ones like 3D-spatial relationships and numeracy. In addition to the benchmark, we propose enhanced evaluation metrics designed to assess these diverse compositional challenges. These include a detection-based metric tailored for evaluating 3D-spatial relationships and numeracy, and an analysis leveraging Multimodal Large Language Models (MLLMs), i.e. GPT-4 V, ShareGPT4v as evaluation metrics. Our experiments benchmark 11 text-to-image models, including state-of-the-art models, such as FLUX.1, SD3, DALLE-3, Pixart-α, and SD-XL on T2I-CompBench++. We also conduct comprehensive evaluations to validate the effectiveness of our metrics and explore the potential and limitations of MLLMs. |
| Persistent Identifier | http://hdl.handle.net/10722/359175 |
| ISSN | 2023 Impact Factor: 20.8 2023 SCImago Journal Rankings: 6.158 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Huang, Kaiyi | - |
| dc.contributor.author | Duan, Chengqi | - |
| dc.contributor.author | Sun, Kaiyue | - |
| dc.contributor.author | Xie, Enze | - |
| dc.contributor.author | Li, Zhenguo | - |
| dc.contributor.author | Liu, Xihui | - |
| dc.date.accessioned | 2025-08-23T00:30:26Z | - |
| dc.date.available | 2025-08-23T00:30:26Z | - |
| dc.date.issued | 2025-01-01 | - |
| dc.identifier.citation | IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, v. 47, n. 5, p. 3563-3579 | - |
| dc.identifier.issn | 0162-8828 | - |
| dc.identifier.uri | http://hdl.handle.net/10722/359175 | - |
| dc.description.abstract | Despite the impressive advances in text-to-image models, they often struggle to effectively compose complex scenes with multiple objects, displaying various attributes and relationships. To address this challenge, we present T2I-CompBench++, an enhanced benchmark for compositional text-to-image generation. T2I-CompBench++ comprises 8,000 compositional text prompts categorized into four primary groups: attribute binding, object relationships, generative numeracy, and complex compositions. These are further divided into eight sub-categories, including newly introduced ones like 3D-spatial relationships and numeracy. In addition to the benchmark, we propose enhanced evaluation metrics designed to assess these diverse compositional challenges. These include a detection-based metric tailored for evaluating 3D-spatial relationships and numeracy, and an analysis leveraging Multimodal Large Language Models (MLLMs), i.e. GPT-4 V, ShareGPT4v as evaluation metrics. Our experiments benchmark 11 text-to-image models, including state-of-the-art models, such as FLUX.1, SD3, DALLE-3, Pixart-α, and SD-XL on T2I-CompBench++. We also conduct comprehensive evaluations to validate the effectiveness of our metrics and explore the potential and limitations of MLLMs. | - |
| dc.language | eng | - |
| dc.publisher | Institute of Electrical and Electronics Engineers | - |
| dc.relation.ispartof | IEEE Transactions on Pattern Analysis and Machine Intelligence | - |
| dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
| dc.subject | Benchmark and evaluation | - |
| dc.subject | compositional text-to-image generation | - |
| dc.subject | image generation | - |
| dc.title | T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-Image Generation | - |
| dc.type | Article | - |
| dc.identifier.doi | 10.1109/TPAMI.2025.3531907 | - |
| dc.identifier.pmid | 40031217 | - |
| dc.identifier.scopus | eid_2-s2.0-105003036178 | - |
| dc.identifier.volume | 47 | - |
| dc.identifier.issue | 5 | - |
| dc.identifier.spage | 3563 | - |
| dc.identifier.epage | 3579 | - |
| dc.identifier.eissn | 1939-3539 | - |
| dc.identifier.issnl | 0162-8828 | - |
