Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code embeddings #84

Open
rragundez opened this issue Jun 27, 2024 · 7 comments
Open

Code embeddings #84

rragundez opened this issue Jun 27, 2024 · 7 comments

Comments

@rragundez
Copy link

Is there any information if this is also recommended for extracting embeddings from code snippets? In particular Javascipt and Solidity?

@SeanLee97
Copy link
Owner

SeanLee97 commented Jun 27, 2024

Hi @rragundez , Maybe you can have a try to WhereIsAI/UAE-Code-Large-V1. It was trained using the github-issue-similarity dataset, which contains some javascript code.

angle = AnglE.from_pretrained('WhereIsAI/UAE-Code-Large-V1').cuda()

angle.encode("YOUR CODE")

@rragundez
Copy link
Author

Let me try it and I'll comment back here the results

@rragundez
Copy link
Author

It did work but results over solidity code is not very good. thanks.

I am going to try with LLM trained on SOlidity code, but it has GGUF files, how would I use those in this library? for example:

https://huggingface.co/mradermacher/Solidity-Llama3-8b-GGUF

@SeanLee97
Copy link
Owner

maybe you can use its base model: https://huggingface.co/andrijdavid/Solidity-Llama3-8b

@rragundez
Copy link
Author

Would this work out of the box just putting the model name as the argument?

@SeanLee97
Copy link
Owner

Yes. For LLM inference, you can check it document: https://angle.readthedocs.io/en/latest/notes/quickstart.html#infer-llm-based-models

Since this model hasn't been trained on sentence embedding learning, it is recommended to use some prompts to improve performance. You can specify a prompt with angle.encode(..., prompt="Here is a prompt: {text}.").

@SeanLee97
Copy link
Owner

Yes. For LLM inference, you can check it document: https://angle.readthedocs.io/en/latest/notes/quickstart.html#infer-llm-based-models

Since this model hasn't been trained on sentence embedding learning, it is recommended to use some prompts to improve performance. You can specify a prompt with angle.encode(..., prompt="Here is a prompt: {text}.").

there is no need to specify a pretrained_lora_path, just directly specify the model_name_or_path to andrijdavid/Solidity-Llama3-8b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants