Replies: 1 comment
-
Quick calculation of the most similar submissions (and other metrics)While calculating similarity scores, Dolos spends a lot of time of calculating metrics for each pair of submissions, which scales quadratic with the number of submissions. However, often we're just interested in the top N pairs according to our metrics (similarity, longest fragment, total overlap). Finding a way to dynamically find these "most suspicious" pairs could drastically improve how much submissions we could handle. Unfortunately, this would require switching to a more lazy or request-based algorithm. We currently calculate almost everything during the analysis. On-demand calculation would require having a way to interact with a back-end or being able to calculate them on-the-fly in the front-end. Using an advanced index structure such as a suffix tree would possibly help in this regard, but brings its own challenges. Slower, more advanced calculation of pair edit distanceOnce the general metrics are calculated, we often want to inspect a pair of submissions more closely. Since we will only be inspecting one pair at a time, we can afford to use slower algorithms that deliver more exact results. Especially if these algorithms could benefit from a pre-calculated index structure like a suffix tree. For example: we could search for the edit distance between the AST and even include exact changes when desired. This would allow Dolos to exactly list which transformations were needed to convert one submission into the other. We can find inspiration for this in the tool difftastic. |
Beta Was this translation helpful? Give feedback.
-
This discussion tracks some of the shortcomings of our current engine with the aim of finding improvements for them.
Current shortcomings
Beta Was this translation helpful? Give feedback.
All reactions