File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-Image Generation

TitleT2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-Image Generation
Authors
KeywordsBenchmark and evaluation
compositional text-to-image generation
image generation
Issue Date1-Jan-2025
PublisherInstitute of Electrical and Electronics Engineers
Citation
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, v. 47, n. 5, p. 3563-3579 How to Cite?
AbstractDespite the impressive advances in text-to-image models, they often struggle to effectively compose complex scenes with multiple objects, displaying various attributes and relationships. To address this challenge, we present T2I-CompBench++, an enhanced benchmark for compositional text-to-image generation. T2I-CompBench++ comprises 8,000 compositional text prompts categorized into four primary groups: attribute binding, object relationships, generative numeracy, and complex compositions. These are further divided into eight sub-categories, including newly introduced ones like 3D-spatial relationships and numeracy. In addition to the benchmark, we propose enhanced evaluation metrics designed to assess these diverse compositional challenges. These include a detection-based metric tailored for evaluating 3D-spatial relationships and numeracy, and an analysis leveraging Multimodal Large Language Models (MLLMs), i.e. GPT-4 V, ShareGPT4v as evaluation metrics. Our experiments benchmark 11 text-to-image models, including state-of-the-art models, such as FLUX.1, SD3, DALLE-3, Pixart-α, and SD-XL on T2I-CompBench++. We also conduct comprehensive evaluations to validate the effectiveness of our metrics and explore the potential and limitations of MLLMs.
Persistent Identifierhttp://hdl.handle.net/10722/359175
ISSN
2023 Impact Factor: 20.8
2023 SCImago Journal Rankings: 6.158

 

DC FieldValueLanguage
dc.contributor.authorHuang, Kaiyi-
dc.contributor.authorDuan, Chengqi-
dc.contributor.authorSun, Kaiyue-
dc.contributor.authorXie, Enze-
dc.contributor.authorLi, Zhenguo-
dc.contributor.authorLiu, Xihui-
dc.date.accessioned2025-08-23T00:30:26Z-
dc.date.available2025-08-23T00:30:26Z-
dc.date.issued2025-01-01-
dc.identifier.citationIEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, v. 47, n. 5, p. 3563-3579-
dc.identifier.issn0162-8828-
dc.identifier.urihttp://hdl.handle.net/10722/359175-
dc.description.abstractDespite the impressive advances in text-to-image models, they often struggle to effectively compose complex scenes with multiple objects, displaying various attributes and relationships. To address this challenge, we present T2I-CompBench++, an enhanced benchmark for compositional text-to-image generation. T2I-CompBench++ comprises 8,000 compositional text prompts categorized into four primary groups: attribute binding, object relationships, generative numeracy, and complex compositions. These are further divided into eight sub-categories, including newly introduced ones like 3D-spatial relationships and numeracy. In addition to the benchmark, we propose enhanced evaluation metrics designed to assess these diverse compositional challenges. These include a detection-based metric tailored for evaluating 3D-spatial relationships and numeracy, and an analysis leveraging Multimodal Large Language Models (MLLMs), i.e. GPT-4 V, ShareGPT4v as evaluation metrics. Our experiments benchmark 11 text-to-image models, including state-of-the-art models, such as FLUX.1, SD3, DALLE-3, Pixart-α, and SD-XL on T2I-CompBench++. We also conduct comprehensive evaluations to validate the effectiveness of our metrics and explore the potential and limitations of MLLMs.-
dc.languageeng-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.relation.ispartofIEEE Transactions on Pattern Analysis and Machine Intelligence-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectBenchmark and evaluation-
dc.subjectcompositional text-to-image generation-
dc.subjectimage generation-
dc.titleT2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-Image Generation-
dc.typeArticle-
dc.identifier.doi10.1109/TPAMI.2025.3531907-
dc.identifier.pmid40031217-
dc.identifier.scopuseid_2-s2.0-105003036178-
dc.identifier.volume47-
dc.identifier.issue5-
dc.identifier.spage3563-
dc.identifier.epage3579-
dc.identifier.eissn1939-3539-
dc.identifier.issnl0162-8828-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats