-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How the 3D structure was captures in ESM-3 model #91
Comments
Have you seen our biorxiv paper? https://www.biorxiv.org/content/10.1101/2024.07.01.600583v1 |
Hi, Thanks for your reply. I will check it. Is it possible to specifically answer whether the ESM-3 is trained on AlphaFold 3 D ( full sequence structure ) of the human protein sequences? |
Yes data from the AlphaFoldDB was used to train ESM3, that includes human proteins. |
Thanks for your reply. It is really interesting. I was checking embeddings generated using ESM-3 sequence and ESM-3 structure separately. I found the cluster generated using ESM-3 sequence embedding is different from the cluster generated using ESM-3 structure embeddings. If ESM_3 captures both sequence and structure then why clusters are different for ESM-3 sequence and ESM-3 structure embeddings. I have applied Agglomerative Clustering. To investigate further detail I was checking for two protein sequences Q6P3R8 and Q9BYP7 that appear together in ESM-3 sequence based clustering but did not appear together ESM-3 structure based clustering. I compute the embeddings for Q6P3R8 and Q9BYP7 using ESM-3 structure and measure the cosine similarity which is 0.96962. Next, I compute the embeddings for Q6P3R8 and Q9BYP7 using ESM-3 sequence and measure the cosine similarity which is 0.9922. The cosine similarity is very close but Q6P3R8 and Q9BYP7 appear together in the same cluster when using ESM-3 sequence embedding but they are appearing separately when using ESM-3 structure embedding. Should not the cluster be similar using ESM-3 sequence embedding and ESM-3 structure embedding? Why am I getting different clusters or trees? |
Hi, is it possible to re-train ESM-3 model with structure and sequence? |
Hi,
I was checking ESM-3 structure embedding, and ESM-3 sequence embedding, and found that the distance between embeddings is very less ( 0.0001) . I am curious how ESM-3 model is pre-trained with 3D structure of the protein sequences. Do you have any paper or documentation on ESM-3 from where I can get to know how ESM-3 capture 3D structure?
The text was updated successfully, but these errors were encountered: