Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get new non-spacy token labels #53

Closed
transcendingvictor opened this issue Mar 5, 2024 · 5 comments
Closed

get new non-spacy token labels #53

transcendingvictor opened this issue Mar 5, 2024 · 5 comments
Assignees

Comments

@transcendingvictor
Copy link
Collaborator

My plan is to use the token_labelling.py module, redo the basic categories.
Also, be able to make build categories by interacting manually with on Sheets and exporting the .csv file.

@transcendingvictor transcendingvictor self-assigned this Mar 5, 2024
@jettjaniak
Copy link
Contributor

jettjaniak commented Mar 5, 2024 via email

@jettjaniak
Copy link
Contributor

ACTUALLY - instead of "is in the dataset" column, let's have "how many times appeared in the dataset column". Then we can sort by this and only focus on first 500-1000 tokens for manual labeling

@transcendingvictor transcendingvictor changed the title get new non-SciPy token labels get new non-spacytoken labels Mar 6, 2024
@transcendingvictor transcendingvictor changed the title get new non-spacytoken labels get new non-spacy token labels Mar 6, 2024
@transcendingvictor
Copy link
Collaborator Author

TO-DOs.

  1. Implement the new token categories:
  • is a word
  • instances on dataset
  1. Implement a way to retrieve the labels from a .csv file so that we can manually edit categories.

@jettjaniak
Copy link
Contributor

also

  • starts with space
  • capitalized

@jettjaniak
Copy link
Contributor

but first we need to merge #40 - can you work on that?

@transcendingvictor transcendingvictor linked a pull request Mar 20, 2024 that will close this issue
@jettjaniak jettjaniak closed this as not planned Won't fix, can't repro, duplicate, stale Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants