WIP: Integrate Document Categorization to Frequency Analysis #68

hadenwIV · 2021-03-31T03:26:12Z

What is the current behavior?

Only information about the frequency of individual words is given in the Frequency Analysis section, with no information about the frequency of different categories of words such as words referencing different technical skills and programming languages. There is also no storage of data on the frequency of different words or other data between runs.

What is the new behavior if this PR is merged?

Information on the frequency of the most common words in assignments is stored and remains after a frequency analysis of that assignment is conducted. The user will have an ability to tag an assignment with categories after it's frequency analysis, and the most frequent words in different categories of assignments are to later be compared to find words significant to each category. This is a step on the path to providing frequency analysis information about both the most frequent words and the most frequent categories to the user.

Close #51

Type of change

Please describe the pull request as one of the following:

Other information

This PR has:

Commit messages that are correctly formatted
Tests for newly introduced code
Docstrings for newly introduced code

Developers

@hadenwIV, @hewittk, @donizk, @favourojo, @solisa986

… into issue#51

Just updating text mining tool and adding stop words

enpuyou · 2021-03-31T03:36:58Z

Hi @hadenwIV, please provide a description of the PR based on the template. Thank you!

favourojo · 2021-03-31T03:51:57Z

I worked as the scrum lead.

A program where we will implement the finding of words for the categories by making an addition to the GatorMinor program to store the 50ish most frequent words from each category's set of practicals that we run and then put that set of words into a text file, then we build a program to find the most unique one.

Kiley was assigned to work on categories of words. Wil was assigned work on text mining. Adriana was assigned documentation while also working on the interface. Kyrie will do the majority of the work on the interface.

enpuyou · 2021-03-31T04:00:48Z

@favourojo Thanks. You can actually just edit the description of the PR opened by @hadenwIV.

… track of ideas for implementation

enpuyou · 2021-04-02T20:26:33Z

@favourojo Thanks for working on this feature. Please let us know when you think the PR is ready to be reviewed. In the meantime, could you also change the title of the PR from Issue51 to something more descriptive? Thanks again!

… into issue#51

codecov · 2021-04-27T21:46:51Z

Codecov Report

Merging #68 (f74def8) into master (244b0ba) will decrease coverage by 0.03%.
The diff coverage is 92.00%.

@@            Coverage Diff             @@
##           master      #68      +/-   ##
==========================================
- Coverage   92.09%   92.05%   -0.04%     
==========================================
  Files           6        6              
  Lines         253      277      +24     
==========================================
+ Hits          233      255      +22     
- Misses         20       22       +2

Impacted Files	Coverage Δ
src/analyzer.py	`93.75% <92.00%> (-0.57%)`	⬇️

hewittk · 2021-04-27T22:10:27Z

We think that we finished our enhancement and all of its testing and documentation now, we're ready for official review.

noorbuchi

I think that before we can offer a thorough review of the code, it's crucial to make sure that all test cases are passing and that the overall build is passing too. We can help you get that going, but this should be the priority for now

hewittk · 2021-05-03T00:15:47Z

I think that before we can offer a thorough review of the code, it's crucial to make sure that all test cases are passing and that the overall build is passing too. We can help you get that going, but this should be the priority for now

It looks as though all test cases have been and are still passing, the overall build started failing the day of because of someone adding an empty standup folder. It should be resolved now, can you review it now?

solisa986 and others added 6 commits March 30, 2021 19:03

created the spring log for the documentation part of our tasks

1e5c149

finished the spring log for issue#51

3f5ac7b

Writing word frequencies to csv

2a5a4e4

Merge branch 'issue#51' of github.com:Allegheny-Ethical-CS/GatorMiner…

3233a42

… into issue#51

Putting different run's results into separate files

225ce2f

Update textmining.py

446215a

Just updating text mining tool and adding stop words

Kiley added 4 commits March 31, 2021 02:30

Categorization of words

b638b98

Additional elaboration on functions of tasks completed

902e704

Fixed name spelling

a065da8

Added docstrings

c208849

jjumadinova requested a review from enpuyou March 31, 2021 19:24

donizk added 3 commits April 1, 2021 02:38

moving all of our code files to a folder called categorize_words

9f9733b

created interface file, began implementation for interface

25abe2e

added notes (as comments) to myself onto the __main__.py file to keep…

b816c40

… track of ideas for implementation

corlettim requested review from corlettim, jjumadinova, munzekm and noorbuchi April 1, 2021 13:22

solisa986 changed the title ~~Issue#51~~ Frequency of Word (Issue#51) Apr 4, 2021

hewittk changed the title ~~Frequency of Word (Issue#51)~~ Frequency of Word Categories (Issue#51) Apr 4, 2021

solisa986 and others added 6 commits April 4, 2021 20:54

added some test cases

60c36e2

classifying categories of files inputted

9455433

Merge branch 'issue#51' of github.com:Allegheny-Ethical-CS/GatorMiner…

1b63ffd

… into issue#51

Sorting assignment categories

f994779

finished documenting sprint 2 log and moved the categories_words.py file

48f9fae

formatting

013b9db

Kiley and others added 18 commits April 21, 2021 13:02

Merge branch 'issue#51' of github.com:Allegheny-Ethical-CS/GatorMiner…

eccccde

… into issue#51

Removed unneccessary line in category_frequency

7d5eb8a

Fixed test_category_frequency test cases

573082f

Colored barplot broken down by category

1bcc4a1

New pipfile.lock copied from master branch with dependencies installed

e71adef

Installed importlib.metadata to pipfile.lock

ddba12b

Update pipfile to the master branch

dce826b

Added to match master branch

fca0da9

Merge branch 'issue#51' of github.com:Allegheny-Ethical-CS/GatorMiner…

16992bd

… into issue#51

Deleted sprint log

f6373a6

Fixed visualization flake8 errors

c337883

Reset textmining

c8886d2

Fixed flake8 issues

d2afa09

Deleted no longer used word cloud generator

45efa7f

Black reformatting

6d4059d

Specification about bar plot type in docstring

4c8e35e

Fix flake8 line length errors

3b39e54

Fixed flake8 errors in test_analyzer

f36f0e1

Kiley added 2 commits April 27, 2021 17:51

Removed plots_per_row argument

32cc38b

Remove non used dictionaries

2fa124c

solisa986 and others added 2 commits April 28, 2021 11:26

changes made to git-standup

90949eb

Merge branch 'master' into issue#51

6332292

noorbuchi requested changes Apr 28, 2021

View reviewed changes

corlettim and others added 3 commits April 29, 2021 09:10

Merge branch 'master' into issue#51

4748eb8

Remove git standup folder

6184a77

Fix flake8 spacing errors

f74def8

hewittk requested a review from noorbuchi May 3, 2021 00:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Integrate Document Categorization to Frequency Analysis #68

WIP: Integrate Document Categorization to Frequency Analysis #68

hadenwIV commented Mar 31, 2021 •

edited by enpuyou

Loading

enpuyou commented Mar 31, 2021

favourojo commented Mar 31, 2021 •

edited

Loading

enpuyou commented Mar 31, 2021

enpuyou commented Apr 2, 2021

codecov bot commented Apr 27, 2021 •

edited

Loading

hewittk commented Apr 27, 2021

noorbuchi left a comment

hewittk commented May 3, 2021

WIP: Integrate Document Categorization to Frequency Analysis #68

Are you sure you want to change the base?

WIP: Integrate Document Categorization to Frequency Analysis #68

Conversation

hadenwIV commented Mar 31, 2021 • edited by enpuyou Loading

What is the current behavior?

What is the new behavior if this PR is merged?

Type of change

Other information

This PR has:

Developers

enpuyou commented Mar 31, 2021

favourojo commented Mar 31, 2021 • edited Loading

enpuyou commented Mar 31, 2021

enpuyou commented Apr 2, 2021

codecov bot commented Apr 27, 2021 • edited Loading

Codecov Report

hewittk commented Apr 27, 2021

noorbuchi left a comment

Choose a reason for hiding this comment

hewittk commented May 3, 2021

hadenwIV commented Mar 31, 2021 •

edited by enpuyou

Loading

favourojo commented Mar 31, 2021 •

edited

Loading

codecov bot commented Apr 27, 2021 •

edited

Loading