-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Integrate Document Categorization to Frequency Analysis #68
base: master
Are you sure you want to change the base?
Conversation
Just updating text mining tool and adding stop words
Hi @hadenwIV, please provide a description of the PR based on the template. Thank you! |
I worked as the scrum lead. A program where we will implement the finding of words for the categories by making an addition to the GatorMinor program to store the 50ish most frequent words from each category's set of practicals that we run and then put that set of words into a text file, then we build a program to find the most unique one. Kiley was assigned to work on categories of words. Wil was assigned work on text mining. Adriana was assigned documentation while also working on the interface. Kyrie will do the majority of the work on the interface. |
@favourojo Thanks. You can actually just edit the description of the PR opened by @hadenwIV. |
@favourojo Thanks for working on this feature. Please let us know when you think the PR is ready to be reviewed. In the meantime, could you also change the title of the PR from |
Codecov Report
@@ Coverage Diff @@
## master #68 +/- ##
==========================================
- Coverage 92.09% 92.05% -0.04%
==========================================
Files 6 6
Lines 253 277 +24
==========================================
+ Hits 233 255 +22
- Misses 20 22 +2
|
We think that we finished our enhancement and all of its testing and documentation now, we're ready for official review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that before we can offer a thorough review of the code, it's crucial to make sure that all test cases are passing and that the overall build is passing too. We can help you get that going, but this should be the priority for now
It looks as though all test cases have been and are still passing, the overall build started failing the day of because of someone adding an empty standup folder. It should be resolved now, can you review it now? |
What is the current behavior?
Only information about the frequency of individual words is given in the Frequency Analysis section, with no information about the frequency of different categories of words such as words referencing different technical skills and programming languages. There is also no storage of data on the frequency of different words or other data between runs.
What is the new behavior if this PR is merged?
Information on the frequency of the most common words in assignments is stored and remains after a frequency analysis of that assignment is conducted. The user will have an ability to tag an assignment with categories after it's frequency analysis, and the most frequent words in different categories of assignments are to later be compared to find words significant to each category. This is a step on the path to providing frequency analysis information about both the most frequent words and the most frequent categories to the user.
Close #51
Type of change
Please describe the pull request as one of the following:
Other information
This PR has:
Developers
@hadenwIV, @hewittk, @donizk, @favourojo, @solisa986