-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some questions on the paper #6
Comments
Q1: We feed the features generated by UNet into a 1x1 convolution to adjust them to the same dimension as the meta prompts. |
Thanks for the reply! Now I understand the first question but still am not very sure of the second one. I mean, the UNet is trained to process compressed images. Should it be able to also process features? Does it mean that the features contain meaningful vision structures, making it somehow similar to a compressed image? |
Yes, features are extracted and represented to capture the critical information necessary for reconstructing or understanding the original image content. They contain meaningful vision structures like edges, textures, colors, or more abstract patterns in the data and can be akin to a compressed form of the image. |
The text was updated successfully, but these errors were encountered: