-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve mapping #196
Comments
wow this is so interesting! On filtering wiki entries, I'm not sure exactly how ReFinED works but when we used wikidata to extract entities from patents/abstracts, there was a way to filter for relevant entities (although the amount of pages that had appropriate tags to filter with was very low) |
It does feel like we could be adding more complexity - I think treating it as an entity disambiguation problem is still an interesting idea though. Could we treat ESCO as a knowledge base and train our own entity linker to match the extracted skill to a ESCO skill? I semi looked into spacy's entity linker as part of a personal project familiarising myself with prodigy. |
re: Big refactor/make the code more clear and mapping to multiple skills (and ignoring entity disambiguation) - we might want to explore the world of vector dbs/vector search because i think they have a lot of this baked in (i.e. surfacing similar data + speed) |
ok truuuuly live spitballing - what would doing both look like? Is it overkill?
|
oo I like it! It does make far more sense to train our own if it works ok. Once we train the EL model then I guess we don't need to apply all the vectorisation+faiss/elastic steps anymore, so is there a benefit to implementing these to just create the training data? (i.e. maybe our current mappings will do?) |
I think if we go down the EL route, that's a good point - i don't really think there's additional benefit to implementing vector dbs beyond creating training data. I wonder how we can quickly assess that approach? what about training an EL model on an engineered training set with skills that our current approach consistently does not match very well on? |
I've already written a custom entity linker recipe in prodigy as part of a personal project so probably could get the labelling side of things up and running relatively quickly -- https://github.com/india-kerle/viclit_food_linker/blob/main/src/cake_recipe.py |
oo amazing! I was thinking we'd be training on our existing labelled data (rather than creating more)? We'd have to reconfigure the data into the correct format though which might be tricky (but surely easier than labelling more?). I've been following this and the training data is in the form
|
Some avenues to improve our mapping algorithm
The text was updated successfully, but these errors were encountered: