This application calculates document similarities using BERT embeddings. It reads text files from a folder, computes pairwise similarities, and generates a heatmap for visualizing the results. The similarity matrix can be saved as a CSV file, and the heatmap as an image.
- App.py: A Tkinter application that requires a folder with text files. It creates a similarity heatmap for all the text files.
- get_map.py: A non-GUI application that performs the same task as
App.py
.
-
Can be used to automatically grade documents by identifying the best match and comparing similarity.
- Limitation: Only provides deviation information, not whether the content is correct.
-
Can also be used to detect copied content and plagiarism.
- Limitation: Only identifies plagiarism within the scope of the provided text context.