File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: AnyDoor: Zero-shot Image Customization with Region-to-region Reference

TitleAnyDoor: Zero-shot Image Customization with Region-to-region Reference
Authors
KeywordsDiffusion Model
Image Composition
Image Customization
Image Editing
Image Generation
Issue Date25-Apr-2025
PublisherInstitute of Electrical and Electronics Engineers
Citation
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025 How to Cite?
Abstract

This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations with desired shapes. Instead of tuning parameters for each object, our model is trained only once and effortlessly generalizes to diverse object-scene combinations at the inference stage. Such a challenging zeroshot setting requires an adequate characterization of a certain object. To this end, we leverage the powerful self-supervised image encoder (i.e., DINOv2) to extract the discriminative dentity feature of the target object. Besides, we complement the identity feature with detail features, which are carefully designed to maintain appearance details yet allow versatile local variations (e.g., lighting, orientation, posture, etc.), supporting the object in favorably blending with different surroundings. We further propose to borrow knowledge from video datasets, where we can observe various forms (i.e., along the time axis) of a single object, leading to stronger model generalizability and robustness. Starting from the task of object insertion, we further extend the framework of AnyDoor to a general solution with regionto-region image reference. With the different definitions of the source region and target region, the tasks of object insertion, object removal, and image variation could be integrated into one model without introducing extra parameters. In addition, we investigate incorporating other conditions like the mask, pose skeleton, and depth map as additional guidance to achieve more controllable generation


Persistent Identifierhttp://hdl.handle.net/10722/362092
ISSN
2023 Impact Factor: 20.8
2023 SCImago Journal Rankings: 6.158

 

DC FieldValueLanguage
dc.contributor.authorChen, Xi-
dc.contributor.authorHuang, Lianghua-
dc.contributor.authorLiu, Yu-
dc.contributor.authorShen, Yujun-
dc.contributor.authorZhao, Deli-
dc.contributor.authorZhao, Hengshuang-
dc.date.accessioned2025-09-19T00:31:50Z-
dc.date.available2025-09-19T00:31:50Z-
dc.date.issued2025-04-25-
dc.identifier.citationIEEE Transactions on Pattern Analysis and Machine Intelligence, 2025-
dc.identifier.issn0162-8828-
dc.identifier.urihttp://hdl.handle.net/10722/362092-
dc.description.abstract<p>This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations with desired shapes. Instead of tuning parameters for each object, our model is trained only once and effortlessly generalizes to diverse object-scene combinations at the inference stage. Such a challenging zeroshot setting requires an adequate characterization of a certain object. To this end, we leverage the powerful self-supervised image encoder (i.e., DINOv2) to extract the discriminative dentity feature of the target object. Besides, we complement the identity feature with detail features, which are carefully designed to maintain appearance details yet allow versatile local variations (e.g., lighting, orientation, posture, etc.), supporting the object in favorably blending with different surroundings. We further propose to borrow knowledge from video datasets, where we can observe various forms (i.e., along the time axis) of a single object, leading to stronger model generalizability and robustness. Starting from the task of object insertion, we further extend the framework of AnyDoor to a general solution with regionto-region image reference. With the different definitions of the source region and target region, the tasks of object insertion, object removal, and image variation could be integrated into one model without introducing extra parameters. In addition, we investigate incorporating other conditions like the mask, pose skeleton, and depth map as additional guidance to achieve more controllable generation</p>-
dc.languageeng-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.relation.ispartofIEEE Transactions on Pattern Analysis and Machine Intelligence-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectDiffusion Model-
dc.subjectImage Composition-
dc.subjectImage Customization-
dc.subjectImage Editing-
dc.subjectImage Generation-
dc.titleAnyDoor: Zero-shot Image Customization with Region-to-region Reference-
dc.typeArticle-
dc.identifier.doi10.1109/TPAMI.2025.3562237-
dc.identifier.scopuseid_2-s2.0-105003681546-
dc.identifier.eissn1939-3539-
dc.identifier.issnl0162-8828-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats