Exploiting Diffusion Prior for Generalizable Dense Prediction

1University of California, Merced, 2Meta, 3Snap Research, 4Yonsei University

CVPR 2024

A Universal Transferring Framework
for Diffusion Models to Generate ...

Surface Normals

Depth

Semantic Segmentation

Albedo

Shading

Contents generated by recent advanced Text-to-Image (T2I) diffusion models are sometimes too imaginative for existing off-the-shelf property semantic predictors to estimate due to the immitigable domain gap. We introduce Diffusion Models as Prior (DMP), a pipeline utilizing pre-trained T2I models as a prior for dense prediction tasks.

To address the misalignment between deterministic prediction tasks and stochastic T2I models, we reformulate the diffusion process through a sequence of interpolations, establishing a deterministic mapping between input RGB images (x) and output prediction distributions (y). To preserve generalizability, we use low-rank adaptation to fine-tune pre-trained models.

method
A sequence of interpolations between an image and its output.

We train the model with 10K bedroom images and evaluate out-of-domain performance of 3D properties and intrinsic images with diverse scenes and arbitrary images. Out-of-domain segmentation is evaluated with bedroom images in various styles. DMP gives faithful estimation, even on the images where the off-the-shelf schemes fail to handle.

results of 3d property estimation
Results of 3D property estimation
results of segmentation
Results of semantic segmentation

Applications

Surface normals and depths facilitate many vision tasks. We show the examples of 3D photo inpainting. Compared to the default depth estimator (left), the resulting videos produced with the depth maps generated by DMP (right) have more accurate depth relationships between the objects.

BibTeX

@inproceedings{lee2024dmp,
  author    = {Lee, Hsin-Ying and Tseng, Hung-Yu and Lee, Hsin-Ying and Yang, Ming-Hsuan},
  title     = {Exploiting Diffusion Prior for Generalizable Dense Prediction},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2024},
}