Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch Support for Obtaining Residue Embeddings #81

Open
Junseok0207 opened this issue Aug 12, 2024 · 3 comments
Open

Batch Support for Obtaining Residue Embeddings #81

Junseok0207 opened this issue Aug 12, 2024 · 3 comments
Assignees

Comments

@Junseok0207
Copy link

I am currently trying to obtain residue embeddings for protein sequences. The typical workflow involves the following steps:

protein = ESMProtein(sequence=sequence)
protein_tensor = self.model.encode(protein)
config = SamplingConfig(return_per_residue_embeddings=True)
output = client.forward_and_sample(protein_tensor, config)
embeddings = output.per_residue_embedding

However, I don't know how to get embeddings in batch mode. I checked the example in esm/examples/local_generate.py (lines 129-135), but it only shows the batch_generate function, which does not include a way to obtain embeddings. How can I achieve embeddings with batch?

@winatony
Copy link

Bumping this issue, I am also interested in learning if the batching function for generating embeddings is ready yet, and if possible, a small example script showing showing a potential use-case. In the mean time, could you theoretically loop through a list of fasta's and generate embeddings one at a time, or would there be a reason you would want to generate embeddings in batches?

@ebetica
Copy link
Contributor

ebetica commented Aug 27, 2024

We currently don't have support for this, though it shouldn't be too bad to implement. You can definitely just loop through and generate one at a time unless you're running into speed concerns.

@ebetica ebetica self-assigned this Aug 27, 2024
@lhallee
Copy link

lhallee commented Dec 6, 2024

Hi @Junseok0207 @winatony @ebetica, my group made a wrapper for this that has full Huggingface integration and batching.
https://huggingface.co/Synthyra/ESMplusplus_small

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants