Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalization of microtext #140

Open
salonipriyani opened this issue Apr 21, 2020 · 6 comments
Open

Normalization of microtext #140

salonipriyani opened this issue Apr 21, 2020 · 6 comments
Labels
enhancement gssoc20 Issues to be picked up by participants during GSSoC 2020 hard Hard level issue GSSoC 2020

Comments

@salonipriyani
Copy link

Is your feature request related to a problem? Please describe.
The text from the user could be erroneous (due to typos or spelling mistakes) or have abbreviated words (BLR for Bangalore Airport).

Describe the solution you'd like
First, using a lexical approach, the abbreviations and acronyms will be handled and then using a phonetic algorithm, Soundex, the spelling mistakes will be corrected.

Describe alternatives you've considered

Additional context

@vishakha-lall vishakha-lall added gssoc20 Issues to be picked up by participants during GSSoC 2020 hard Hard level issue GSSoC 2020 labels Apr 21, 2020
@vishakha-lall
Copy link
Owner

@salonipriyani What is the update on this?

@Rukmini-Meda
Copy link
Contributor

Rukmini-Meda commented May 19, 2020

@vishakha-lall I would like to work on this issue. Can you please assign this to me as part of GSSoC'20?

@vishakha-lall
Copy link
Owner

@Rukmini-Meda Please share your approach for this issue, what kind of normalisation would you be doing, where will you get the data to train the normalisation on?

@shreyanshi2228
Copy link

Is it like gathering data for some common abbreviations?

@shreyanshi2228
Copy link

@vishakha-lall
Copy link
Owner

@shreyanshi2228 Please elaborate on how you plan to use the shared resource with respect to the requirements of this project. Abbreviations to common cities and locations would be more relevant to this project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement gssoc20 Issues to be picked up by participants during GSSoC 2020 hard Hard level issue GSSoC 2020
Projects
None yet
Development

No branches or pull requests

4 participants