Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ID modifications but with text input #18

Open
gebaltso opened this issue May 27, 2024 · 3 comments
Open

ID modifications but with text input #18

gebaltso opened this issue May 27, 2024 · 3 comments

Comments

@gebaltso
Copy link

Hello and congrats on your great work! Just to clarify something: Is it possible to generate new images for the same id but providing a text input to guide the modification? Eg change hair color or pose etc.
Thanks in advance.

@foivospar
Copy link
Owner

Hi! This model focuses exclusively on ID information and adapts the CLIP encoder accordingly. It is not designed to follow text promps. This could perhaps be achieved by combining our ID encoder with the original text encoder in some way, but in the current state we only support ID input (plus compatible conditions/adapters such as ControlNet).

@vitrun
Copy link

vitrun commented Nov 18, 2024

Hi,
Thanks for sharing your work. I've been experimenting with it and encountered some issues related to identity preservation when altering inputs in the project_face_embs method:

  • Replacing the Text: I substituted the "photo of an id person" with different texts, but none of the generated images retained the identity of the reference image.
  • Adding Extra Text: I also tried adding additional text descriptions, such as "floating on top of water". Similarly, the identity of the original was not preserved.

It seems that these changes significantly disrupt the text conditioning capability. Do you have any suggestions for a workaround or fix?
Thanks for your help!

@foivospar
Copy link
Owner

Hi,
As mentioned above the models have been intentionally overfitted to the default prompt ("photo of an id person") to focus on ArcFace embeddings and maximize ID similarity. The fine-tuned encoder will, thus, not work with other prompts. A workaround would be to combine its output with the text embeddings from the original CLIP encoder or to integrate text via an external adapter (IP-Adapter, ControlNet). In both cases, you would likely need to fine-tune the model - either the UNet or the external module - for text-driven generation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants