Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plagiarism checks: Plagiarism check shows 69% similarity for exactly same submissions #7174

Open
jakubriegel opened this issue Sep 11, 2023 · 10 comments
Assignees
Labels
bug component:Plagiarism Detection exercise Pull requests that affect the corresponding module plagiarism Pull requests that affect the corresponding module programming Pull requests that affect the corresponding module

Comments

@jakubriegel
Copy link
Contributor

Describe the bug

Two exactly same submissions got 69% similarity

To Reproduce

  1. Create two exactly the same programming exercises submissions
  2. Run plagiarism checks on them
  3. Check the result

Expected behavior

Two exactly same submissions have 100% similarity

Screenshots

image

Which version of Artemis are you seeing the problem on?

6.4.3

What browsers are you seeing the problem on?

Chrome, Safari

Additional context

No response

Relevant log output

No response

@krusche
Copy link
Member

krusche commented Sep 11, 2023

JPlag calculates the similarities. While it might be possible that Artemis inputs the data wrongly to JPlag, I don't think this is the case. It is more realistic that you found an edge case for a very simple comparison that does not work.

@dfuchss any ideas?

@dfuchss
Copy link
Contributor

dfuchss commented Sep 11, 2023

mmh .. I cannot reproduce the behavior with JPlag.
I've used the following files:

BubbleSort.java

package edu;

import java.util.*;

public class BubbleSort {

    /**
     * Sorts dates with BubbleSort.
     *
     * @param input the List of Dates to be sorted
     */
    public void performSort(final List<Date> input) {
		var x = 123;
        //TODO: implement
    }
	private static int abc(int a) {
		return a*5;
	}
	
	private int def(int d) {
		return d - 15;
	}
}

JPlag reports 100%:

image

@dfuchss
Copy link
Contributor

dfuchss commented Sep 11, 2023

@dfuchss
Copy link
Contributor

dfuchss commented Sep 11, 2023

Nevertheless, in general depending on the input size this might be the important Issue. But, I don't understand why JPlag produces 100% and Artemis ~70% on the same file (as far as I can see in the images).

@jakubriegel do you have more files. If you have very small files. This could lead to smaller values of similarity.

@krusche
Copy link
Member

krusche commented Sep 11, 2023

I would assume the image only shows one file, so it could theoretically be the case that other files differ. Maybe this is also related to the fact that we exclude the initial template.

@jakubriegel please link an example of one of the test servers or produce a minimum example to reproduce the issue

@dfuchss
Copy link
Contributor

dfuchss commented Sep 20, 2023

/cc jfyi @tsaglam

@tsaglam
Copy link

tsaglam commented Sep 21, 2023

Another factor could be the basecode functionality, if you provide a class with an empty main method as a basecode template (shared code that is, for example, given to all students as part of the exercise), these parts of the source code will not be matched. I can probably give more input if I get more details on how JPlag was configured for that run.

@jakubriegel
Copy link
Contributor Author

This is most likely an effect of using the case code feature.

To verify, it I've carefully ran this comparison again on Artemis using debugger. I used the same exercise template and 3 submissions: 2 with BubbleSort.java same as in the screenshot and 1 same as the template. The findings are:

  • JPlag from Artemis correctly matches the similarities: the only found match is the whole BubbleSort.java for the two modified submissions,
  • the 69.57% similarity is produced by JPlag,
  • for the template submission JPlag correctly indicates 0% similarity.

The number 69.57% comes from the de.jplag.JPlagComparison.similarity() method. Generally, it calculates the similarity as the division of the number of matched tokens by the total number of tokens. The two modified submission have 99 tokens in total of which 16 tokens are matched. If no base code was used then 99 tokens would be matched and the similarity would be 100%. But, since there is the base code, JPlag matches only 16 tokens and the division is modified by the number of token matched between the base code and the submission. This yields the similarity of 69.57%. I guess the motivation was to acknowledge the unchanged base code lines in the results (as not all the lines from the modified BubbleSort.java differ from the base code.

In short, this looks like a feature, not a bug 🙃

@tsaglam @dfuchss Can you confirm if JPlag works as intended in the described scenario?

@krusche @MarkusPaulsen Should we keep it in Artemis like that? An idea to augment the behaviour would be not to use the base code feature. Since the minimum size parameter is implemented as the minumum number of diff between the submission and the template, instructors should have enough control over the process. What do you think?

@tsaglam
Copy link

tsaglam commented Oct 24, 2023

Can you confirm if JPlag works as intended in the described scenario?

Yes, if you use basecode, then you basically tell JPlag: "Do not count that code, this is template code that we gave every student". These code segments are not counted for the similarity calculation to reduce false positives based on the template code. Using basecode makes sense, iff you gave students some template code that they did not alter. Thus, using this feature depends on the specific use case and assignment.

I think what causes the confusion here is the Artemis UI not showing which parts of the code are matched and which parts are not. In the JPlag report viewer, matches between two submissions are not highlighted when they are part of the basecode.

@dfuchss
Copy link
Contributor

dfuchss commented Feb 22, 2024

Just as an idea: Maybe it would be an option to integrate the JPlag UI into Artemis.

@MarkusPaulsen MarkusPaulsen changed the title Plagiarism detection: Plagiarism check shows 69% similarity for exactly same submissions Plagiarism checks: Plagiarism check shows 69% similarity for exactly same submissions Nov 5, 2024
@github-actions github-actions bot added assessment Pull requests that affect the corresponding module exercise Pull requests that affect the corresponding module plagiarism Pull requests that affect the corresponding module programming Pull requests that affect the corresponding module text Pull requests that affect the corresponding module labels Nov 5, 2024
@maximiliansoelch maximiliansoelch removed assessment Pull requests that affect the corresponding module text Pull requests that affect the corresponding module labels Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug component:Plagiarism Detection exercise Pull requests that affect the corresponding module plagiarism Pull requests that affect the corresponding module programming Pull requests that affect the corresponding module
Projects
None yet
Development

No branches or pull requests

6 participants