We recommend watching all images in full screen. Click on the images for seeing them in full scale.
To visualize the features of the diffusion model, we perform a PCA analysis on features extracted from three different sets of images. Specifically, for each set, we extract features from the decoder layers of the U-Net and apply PCA on the features of each set (Section 3, Figure 4 in the paper). Layer 4 features (highlighted in orange) reveal semantic regions (e.g., legs, torso etc.) that are shared across all images in each set. That is, similar semantic regions are depicted in similar colors in PCA space. We also note that features at higher resolution capture low-level, high-frequency information, while coarser features depict a rough location of the dominant object.