Replies: 4 comments 1 reply
-
We've had this kind of output in the past, but it was removed because it produced too much output and slowed down the analysis significantly. We now calculate the matching fragments on-the-fly because it should be fast enough for most our use cases. Since you are doing some advanced things with Dolos, I would suggest using the library (@dodona/dolos-lib). Here is an example how you can use the library to print out the relevant fragments: const dolos = new Dolos();
const report = await dolos.analyzePaths(files);
for (const pair of report.allPairs()) {
for (const fragment of pair.buildFragments()) {
const left = fragment.leftSelection;
const right = fragment.rightSelection;
console.log(`${pair.leftFile.path}:{${left.startRow},${left.startCol} -> ${left.endRow},${left.endCol}} matches with ${pair.rightFile.path}:{${right.startRow},${right.startCol} -> ${right.endRow},${right.endCol}}`);
}
} For the full example, visit https://github.com/rien/dolos-lib-example Let us know if you need more information. |
Beta Was this translation helpful? Give feedback.
-
Out of curiosity: which system are you integrating Dolos with? We're currently doing our own integration as well, so we might be able to share some ideas. |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot. That was easy ;-). Can I find doc to the I'm integrating Dolos into our internal proprietary faculty information system, replacing an older plagiarism detection system that is clearly outperfomed by Dolos. I have written some shell scripts to preprocess source codes, split them by language, run Dolos on them on per-languge basis, make plots using R, pick most similar pairs and run Dolos on such pairs, and finally to capture and parse the color console output, producing a json report. I would be happy to share ideas and code, if you like. |
Beta Was this translation helpful? Give feedback.
-
OK, let me share some ideas / experience:
I would be grateful for any comments on my approach. |
Beta Was this translation helpful? Give feedback.
-
As an alternative to console output, would it be please possible to report matching fragments in some machine-readable format?
My use case is this: I need the matching fragments to be translated to other system. Currently, I store the color console output with color-coded matching fragments to a text file using the
script
tool, and subsequently I parse this file to learn starts and ends of matching fragments. However, this is quite complex and slow.Would it be possible to report matching fragments e.g. in
json
orcsv
form consisting of matching fragments represented e.g. by<starting line number>:<starting char index at line>, <ending line number>:<ending char index at line>
, please?Beta Was this translation helpful? Give feedback.
All reactions