Server Limitations? #30

sourface94 · 2024-03-28T15:10:25Z

sourface94
Mar 28, 2024

Hi I was wondering if there are any server limitations in regards to using other language models when getting the sentence embeddings?

woodthom2 · 2024-03-28T15:13:38Z

woodthom2
Mar 28, 2024
Maintainer

Hi, we have 16 GB on the server but the tool is also running spaCy and ideally Harmony would not crash if we have multiple users concurrently. Do you have a proposed different LLM to use?

0 replies

sourface94 · 2024-03-28T15:19:59Z

sourface94
Mar 28, 2024
Author

I want to test other open source embedding options on huggingface but wanted to know if there were any space limitations first. I noticed paraphrase-multilingual-MiniLM-L12-v2 is used as opposed to paraphrase-multilingual-mpnet-base-v2 and wondered if it was because of memory limitations.

I was also interested in different ways of parsing the questions, For example removing information from the questions that may not be relevant to the overall meaning, but I would have to have a closer look at the types of questions first.

Is there a test set available that we can use for testing performance?

0 replies

woodthom2 · 2024-03-28T16:07:59Z

woodthom2
Mar 28, 2024
Maintainer

yes there is, please try the scripts in https://github.com/harmonydata/matching which are testing different LLMs against a number of datasets. This notebook shows the results on those datasets: https://github.com/harmonydata/matching/blob/main/analyse_results.ipynb

0 replies

sourface94 · 2024-03-31T04:16:29Z

sourface94
Mar 31, 2024
Author

Hi I tested the model in the final column which seems to perform better than the model in production. I can make a pull request today if there are no other tests that need to be checked.

output.xlsx

0 replies

woodthom2 · 2024-03-31T09:37:27Z

woodthom2
Mar 31, 2024
Maintainer

Hi Thanks so much! That's fantastic! Yes please feel free to make the PR but first can you check that the API server runs locally for you with this change and the unit tests pass? Thanks!

…

On Sun, 31 Mar 2024 at 05:16, sourface94 ***@***.***> wrote: Hi I tested the model in the final column which seems to perform better than the model in production. I can make a pull request today if there are no other tests that need to be checked. image.png (view on web) <https://github.com/harmonydata/harmony/assets/15061574/5531b2f9-2a6c-4c1f-ac0e-426200f27ca7> output.xlsx <https://github.com/harmonydata/harmony/files/14814108/output.xlsx> — Reply to this email directly, view it on GitHub <#30 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUBTVMBWE4HQOY5FWNTUJDY26E3DAVCNFSM6AAAAABFM65CVOVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DSNRSHA4TG> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

sourface94 · 2024-03-31T23:45:22Z

sourface94
Mar 31, 2024
Author

Hi, all tests passed apart from TestMatchMhc due to the embeddings being a hardcoded length here

harmony/tests/test_match_mhc.py

Line 53 in 9c4cdfc

    
           mhc_embeddings = np.array([[0.31698248, 0.12777875, 0.04758111, 0.42555183, 0.39878827,

This length does not match with the length of the embeddings for the model I tested. Is this OK?

0 replies

woodthom2 · 2024-04-01T15:29:17Z

woodthom2
Apr 1, 2024
Maintainer

OK thanks. I guess we need to regenerate the Mental Health Catalogue embeddings for the new LLM too. That code is here: https://github.com/harmonydata/mentalhealthcatalogue_etl but I appreciate it's not properly documented. Do you want to make your PR and if you can see an easy way to fix the Mental Health Catalogue integration you could add it, but if not I can add it in (it will be next week as I'm not working this week)

…

On Mon, 1 Apr 2024 at 00:45, sourface94 ***@***.***> wrote: Hi, all tests passed apart from TestMatchMhc due to the embeddings being a hardcoded length here https://github.com/harmonydata/harmony/blob/9c4cdfce74e5fb61be2f2c7a824aceafb864c2c4/tests/test_match_mhc.py#L53 This length does not match with the length of the embeddings for the model I tested — Reply to this email directly, view it on GitHub <#30 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUBTVMPAYP6HH6UEUQZHJ3Y3CN2RAVCNFSM6AAAAABFM65CVOVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DSNRXG43TK> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

sourface94 · 2024-04-02T20:43:20Z

sourface94
Apr 2, 2024
Author

Great, I've made the request and will have a look at the repo you shared.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harmony

Server Limitations? #30

{{title}}

Replies: 8 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Harmony

Server Limitations? #30

sourface94 Mar 28, 2024

Replies: 8 comments

woodthom2 Mar 28, 2024 Maintainer

sourface94 Mar 28, 2024 Author

woodthom2 Mar 28, 2024 Maintainer

sourface94 Mar 31, 2024 Author

woodthom2 Mar 31, 2024 Maintainer

sourface94 Mar 31, 2024 Author

woodthom2 Apr 1, 2024 Maintainer

sourface94 Apr 2, 2024 Author

sourface94
Mar 28, 2024

woodthom2
Mar 28, 2024
Maintainer

sourface94
Mar 28, 2024
Author

woodthom2
Mar 28, 2024
Maintainer

sourface94
Mar 31, 2024
Author

woodthom2
Mar 31, 2024
Maintainer

sourface94
Mar 31, 2024
Author

woodthom2
Apr 1, 2024
Maintainer

sourface94
Apr 2, 2024
Author