Fine tune a large language model for psychology domain (DOXA AI competition) #55

woodthom2 · 2024-10-24T08:24:52Z

Description

We need to fine tune an LLM for mental health data. We have training and evaluation data available.

We are running a competition on the online platform DOXA AI where you can win up to 500 GBP in vouchers (1st place prize). Check it out here: https://harmonydata.ac.uk/doxa/

How to get started?

Create an account on DOXA AI and run the example notebook. This will download the training data.

If you would like some tips on how to train an LLM, I recommend this Hugging Face tutorial

Rationale

We have noticed for a while that the similarity score that Harmony gives back could be improved. For example, items to do with "sleep" are often grouped together (because of the data that the off the shelf LLMs such as SentenceTransformers are trained on) while a psychologist would consider them to be different. Read a blog post about how we evaluated the models.

Even the OpenAI models make the same mistakes as the Hugging Face models due to lack of domain understanding. For example, a large number of psychological conditions are sleep related and the LLMs tend to group things together when anything is related to sleep, while a human would consider them entirely different for the point of view of harmonisation of items in a study ("I sleep deeply" vs "I have trouble sleeping" vs "I have little time to sleep"), and similarly "child is bullied" vs "child bullies others" are another false positive. That's why we have the initiative to fine tune an LLM for this domain.

woodthom2 added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine tune a large language model for psychology domain (DOXA AI competition) #55

Fine tune a large language model for psychology domain (DOXA AI competition) #55

woodthom2 commented Oct 24, 2024

Fine tune a large language model for psychology domain (DOXA AI competition) #55

Fine tune a large language model for psychology domain (DOXA AI competition) #55

Comments

woodthom2 commented Oct 24, 2024

Description

How to get started?

Rationale