File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/TPAMI.2025.3562237
- Scopus: eid_2-s2.0-105003681546
- Find via

Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Article: AnyDoor: Zero-shot Image Customization with Region-to-region Reference
| Title | AnyDoor: Zero-shot Image Customization with Region-to-region Reference |
|---|---|
| Authors | |
| Keywords | Diffusion Model Image Composition Image Customization Image Editing Image Generation |
| Issue Date | 25-Apr-2025 |
| Publisher | Institute of Electrical and Electronics Engineers |
| Citation | IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025 How to Cite? |
| Abstract | This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations with desired shapes. Instead of tuning parameters for each object, our model is trained only once and effortlessly generalizes to diverse object-scene combinations at the inference stage. Such a challenging zeroshot setting requires an adequate characterization of a certain object. To this end, we leverage the powerful self-supervised image encoder (i.e., DINOv2) to extract the discriminative dentity feature of the target object. Besides, we complement the identity feature with detail features, which are carefully designed to maintain appearance details yet allow versatile local variations (e.g., lighting, orientation, posture, etc.), supporting the object in favorably blending with different surroundings. We further propose to borrow knowledge from video datasets, where we can observe various forms (i.e., along the time axis) of a single object, leading to stronger model generalizability and robustness. Starting from the task of object insertion, we further extend the framework of AnyDoor to a general solution with regionto-region image reference. With the different definitions of the source region and target region, the tasks of object insertion, object removal, and image variation could be integrated into one model without introducing extra parameters. In addition, we investigate incorporating other conditions like the mask, pose skeleton, and depth map as additional guidance to achieve more controllable generation |
| Persistent Identifier | http://hdl.handle.net/10722/362092 |
| ISSN | 2023 Impact Factor: 20.8 2023 SCImago Journal Rankings: 6.158 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Chen, Xi | - |
| dc.contributor.author | Huang, Lianghua | - |
| dc.contributor.author | Liu, Yu | - |
| dc.contributor.author | Shen, Yujun | - |
| dc.contributor.author | Zhao, Deli | - |
| dc.contributor.author | Zhao, Hengshuang | - |
| dc.date.accessioned | 2025-09-19T00:31:50Z | - |
| dc.date.available | 2025-09-19T00:31:50Z | - |
| dc.date.issued | 2025-04-25 | - |
| dc.identifier.citation | IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025 | - |
| dc.identifier.issn | 0162-8828 | - |
| dc.identifier.uri | http://hdl.handle.net/10722/362092 | - |
| dc.description.abstract | <p>This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations with desired shapes. Instead of tuning parameters for each object, our model is trained only once and effortlessly generalizes to diverse object-scene combinations at the inference stage. Such a challenging zeroshot setting requires an adequate characterization of a certain object. To this end, we leverage the powerful self-supervised image encoder (i.e., DINOv2) to extract the discriminative dentity feature of the target object. Besides, we complement the identity feature with detail features, which are carefully designed to maintain appearance details yet allow versatile local variations (e.g., lighting, orientation, posture, etc.), supporting the object in favorably blending with different surroundings. We further propose to borrow knowledge from video datasets, where we can observe various forms (i.e., along the time axis) of a single object, leading to stronger model generalizability and robustness. Starting from the task of object insertion, we further extend the framework of AnyDoor to a general solution with regionto-region image reference. With the different definitions of the source region and target region, the tasks of object insertion, object removal, and image variation could be integrated into one model without introducing extra parameters. In addition, we investigate incorporating other conditions like the mask, pose skeleton, and depth map as additional guidance to achieve more controllable generation</p> | - |
| dc.language | eng | - |
| dc.publisher | Institute of Electrical and Electronics Engineers | - |
| dc.relation.ispartof | IEEE Transactions on Pattern Analysis and Machine Intelligence | - |
| dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
| dc.subject | Diffusion Model | - |
| dc.subject | Image Composition | - |
| dc.subject | Image Customization | - |
| dc.subject | Image Editing | - |
| dc.subject | Image Generation | - |
| dc.title | AnyDoor: Zero-shot Image Customization with Region-to-region Reference | - |
| dc.type | Article | - |
| dc.identifier.doi | 10.1109/TPAMI.2025.3562237 | - |
| dc.identifier.scopus | eid_2-s2.0-105003681546 | - |
| dc.identifier.eissn | 1939-3539 | - |
| dc.identifier.issnl | 0162-8828 | - |
