Group items into new harmonised variables #22

woodthom2 · 2024-05-31T16:03:58Z

Description

can we add to the export, groups of similar items? E.g. everything to do with height across 5 studies? Perhaps this could also be another view in the visualisation in the tool.

See mockup:

https://github.com/harmonydata/hackathon/blob/main/find_variable.png

Rationale

Users have requested this feature. Because there is otherwise a manual step going from the similarity matrix (which is currently in the export ) to harmonised variables.

ronnyTodgers · 2024-05-31T16:30:15Z

Sure is that grouping in the API data somewhere? Would be great to add it in and offer filtering by it. Or is this the topics_auto / topics_strengths field to be leveraged here? J --- John Rogers Delosis Ltd

…

On 31 May 2024, at 18:04, Thomas Wood ***@***.***> wrote: Description can we add to the export, groups of similar items? E.g. everything to do with height across 5 studies? Perhaps this could also be another view in the visualisation in the tool. See mockup: https://github.com/harmonydata/hackathon/blob/main/find_variable.png Rationale Users have requested this feature. Because there is otherwise a manual step going from the similarity matrix (which is currently in the export ) to harmonised variables. — Reply to this email directly, view it on GitHub <#22>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFKMOW5T5SVZZ5ORYFPSX6LZFCNQHAVCNFSM6AAAAABITCHFXOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZDQMJVGQ4TANY>. You are receiving this because you are subscribed to this thread.

woodthom2 · 2024-05-31T16:38:52Z

I was thinking maybe the front end can apply some simple deterministic logic to make the groupings using the similarity matrix as an input. We cannot use clustering algorithms because they are slow and also not reproducible. E.g. we set a threshold for what level of similarity constitutes a group. Maybe 60%. Then a group (which then would become a single variable such as "height" or "anxiety" in the researcher's meta analysis) could be either (a) a set of items from the original questionnaires where they all have similarity above 60% to all other members of the set, or (b) a set of items where each one is connected to at least one other member of the set by similarity > 60%. But maybe that logic is better put in the API? But if the logic is simple enough we can do it in the FE which might allow faster iteration of how it's done.

…

On Fri, 31 May 2024, 17:30 ronnyTodgers, ***@***.***> wrote: Sure is that grouping in the API data somewhere? Would be great to add it in and offer filtering by it. Or is this the topics_auto / topics_strengths field to be leveraged here? J --- John Rogers Delosis Ltd > On 31 May 2024, at 18:04, Thomas Wood ***@***.***> wrote: > > > Description > > can we add to the export, groups of similar items? E.g. everything to do with height across 5 studies? Perhaps this could also be another view in the visualisation in the tool. > > See mockup: > > https://github.com/harmonydata/hackathon/blob/main/find_variable.png > > Rationale > > Users have requested this feature. Because there is otherwise a manual step going from the similarity matrix (which is currently in the export ) to harmonised variables. > > — > Reply to this email directly, view it on GitHub < #22>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AFKMOW5T5SVZZ5ORYFPSX6LZFCNQHAVCNFSM6AAAAABITCHFXOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZDQMJVGQ4TANY>. > You are receiving this because you are subscribed to this thread. > — Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUBTVMNOAX73XWUCRHWFBTZFCQS3AVCNFSM6AAAAABITCHFXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBSGYYDSMRQGU> . You are receiving this because you authored the thread.Message ID: ***@***.***>

woodthom2 · 2024-05-31T17:08:09Z

To be more clear whatever the threshold is, would ideally be a slider. So that would be a reason to do the groups in the FE

…

On Fri, 31 May 2024, 17:37 Thomas Wood, ***@***.***> wrote: I was thinking maybe the front end can apply some simple deterministic logic to make the groupings using the similarity matrix as an input. We cannot use clustering algorithms because they are slow and also not reproducible. E.g. we set a threshold for what level of similarity constitutes a group. Maybe 60%. Then a group (which then would become a single variable such as "height" or "anxiety" in the researcher's meta analysis) could be either (a) a set of items from the original questionnaires where they all have similarity above 60% to all other members of the set, or (b) a set of items where each one is connected to at least one other member of the set by similarity > 60%. But maybe that logic is better put in the API? But if the logic is simple enough we can do it in the FE which might allow faster iteration of how it's done. On Fri, 31 May 2024, 17:30 ronnyTodgers, ***@***.***> wrote: > Sure is that grouping in the API data somewhere? Would be great to add it > in and offer filtering by it. Or is this the topics_auto / topics_strengths > field to be leveraged here? > > J > --- > John Rogers > Delosis Ltd > > > > > > On 31 May 2024, at 18:04, Thomas Wood ***@***.***> wrote: > > > > > > Description > > > > can we add to the export, groups of similar items? E.g. everything to > do with height across 5 studies? Perhaps this could also be another view in > the visualisation in the tool. > > > > See mockup: > > > > https://github.com/harmonydata/hackathon/blob/main/find_variable.png > > > > Rationale > > > > Users have requested this feature. Because there is otherwise a manual > step going from the similarity matrix (which is currently in the export ) > to harmonised variables. > > > > — > > Reply to this email directly, view it on GitHub < > #22>, or unsubscribe < > https://github.com/notifications/unsubscribe-auth/AFKMOW5T5SVZZ5ORYFPSX6LZFCNQHAVCNFSM6AAAAABITCHFXOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZDQMJVGQ4TANY>. > > > You are receiving this because you are subscribed to this thread. > > > > — > Reply to this email directly, view it on GitHub > <#22 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ADUBTVMNOAX73XWUCRHWFBTZFCQS3AVCNFSM6AAAAABITCHFXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBSGYYDSMRQGU> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >

ronnyTodgers · 2024-05-31T21:45:26Z

Well we could certainly split up the groups of variables and get them to add a label if one is not obvious from the items ( we could look for common words, common related topics perhaps. What do we do with items that fit intomultipl groups, is a group defined only when all items meet threshold with all other items? Fixed the delay problem and thats all live on the main site now. J --- John Rogers Delosis Ltd

…

On 31 May 2024, at 18:39, Thomas Wood ***@***.***> wrote: I was thinking maybe the front end can apply some simple deterministic logic to make the groupings using the similarity matrix as an input. We cannot use clustering algorithms because they are slow and also not reproducible. E.g. we set a threshold for what level of similarity constitutes a group. Maybe 60%. Then a group (which then would become a single variable such as "height" or "anxiety" in the researcher's meta analysis) could be either (a) a set of items from the original questionnaires where they all have similarity above 60% to all other members of the set, or (b) a set of items where each one is connected to at least one other member of the set by similarity > 60%. But maybe that logic is better put in the API? But if the logic is simple enough we can do it in the FE which might allow faster iteration of how it's done. On Fri, 31 May 2024, 17:30 ronnyTodgers, ***@***.***> wrote: > Sure is that grouping in the API data somewhere? Would be great to add it > in and offer filtering by it. Or is this the topics_auto / topics_strengths > field to be leveraged here? > > J > --- > John Rogers > Delosis Ltd > > > > > > On 31 May 2024, at 18:04, Thomas Wood ***@***.***> wrote: > > > > > > Description > > > > can we add to the export, groups of similar items? E.g. everything to do > with height across 5 studies? Perhaps this could also be another view in > the visualisation in the tool. > > > > See mockup: > > > > https://github.com/harmonydata/hackathon/blob/main/find_variable.png > > > > Rationale > > > > Users have requested this feature. Because there is otherwise a manual > step going from the similarity matrix (which is currently in the export ) > to harmonised variables. > > > > — > > Reply to this email directly, view it on GitHub < > #22>, or unsubscribe < > https://github.com/notifications/unsubscribe-auth/AFKMOW5T5SVZZ5ORYFPSX6LZFCNQHAVCNFSM6AAAAABITCHFXOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZDQMJVGQ4TANY>. > > > You are receiving this because you are subscribed to this thread. > > > > — > Reply to this email directly, view it on GitHub > <#22 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ADUBTVMNOAX73XWUCRHWFBTZFCQS3AVCNFSM6AAAAABITCHFXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBSGYYDSMRQGU> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> > — Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFKMOW3H7MQAZMBFKJI4ELTZFCRTDAVCNFSM6AAAAABITCHFXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBSGYZDEMBRGA>. You are receiving this because you commented.

woodthom2 added the enhancement New feature or request label May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group items into new harmonised variables #22

Group items into new harmonised variables #22

woodthom2 commented May 31, 2024

ronnyTodgers commented May 31, 2024 via email

woodthom2 commented May 31, 2024 via email

woodthom2 commented May 31, 2024 via email

ronnyTodgers commented May 31, 2024 via email

Group items into new harmonised variables #22

Group items into new harmonised variables #22

Comments

woodthom2 commented May 31, 2024

Description

Rationale

ronnyTodgers commented May 31, 2024 via email

woodthom2 commented May 31, 2024 via email

woodthom2 commented May 31, 2024 via email

ronnyTodgers commented May 31, 2024 via email