A Universal Transferring Framework
for Diffusion Models to Generate ...
Contents generated by recent advanced Text-to-Image (T2I) diffusion models are sometimes too imaginative for existing off-the-shelf property semantic predictors to estimate due to the immitigable domain gap. We introduce Diffusion Models as Prior (DMP), a pipeline utilizing pre-trained T2I models as a prior for dense prediction tasks.
To address the misalignment between deterministic prediction tasks and stochastic T2I models, we reformulate the diffusion process through a sequence of interpolations, establishing a deterministic mapping between input RGB images (x) and output prediction distributions (y). To preserve generalizability, we use low-rank adaptation to fine-tune pre-trained models.
We train the model with 10K bedroom images and evaluate out-of-domain performance of 3D properties and intrinsic images with diverse scenes and arbitrary images. Out-of-domain segmentation is evaluated with bedroom images in various styles. DMP gives faithful estimation, even on the images where the off-the-shelf schemes fail to handle.
Surface normals and depths facilitate many vision tasks. We show the examples of 3D photo inpainting. Compared to the default depth estimator (left), the resulting videos produced with the depth maps generated by DMP (right) have more accurate depth relationships between the objects.
@inproceedings{lee2024dmp,
author = {Lee, Hsin-Ying and Tseng, Hung-Yu and Lee, Hsin-Ying and Yang, Ming-Hsuan},
title = {Exploiting Diffusion Prior for Generalizable Dense Prediction},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2024},
}