Skip to content

Latest commit

 

History

History
71 lines (56 loc) · 2.88 KB

StopWordResources.md

File metadata and controls

71 lines (56 loc) · 2.88 KB

Stop Words Lists for Use

Some are created for Natural Language Processing (NLP). Others may be created for better indexing and results for search queries. These will still function as stop words but are selected for a different purpose. Please keep that in mind when you use them. Additionally, each stop words list is created with a certain project in mind. A word the creators might have tagged as uninteresting, may be interesting for you. Thus, we recommend reviewing the stop word list you choose to work with.

Creating your own Stop Words List

Some of the common ways to create your own, if you cannot find one that works for you here, is to find the most commonly used words from the following corpora:

  • The Wikipedia pages of the language
  • Archived literature in that language

You can also use your own corpus to create a stopword list.

Multi-Language Resources

Single-Language Resources

German

French

Jacques Savoy associated with Université de Neuchâtel for Swiss French developed these stop words

Serbian

State University of Novi Pazar developed these stop words

Hindi

Jha, Vandana; N, Manjunath; Shenoy, P Deepa; K R, Venugopal developed these stop words

Student work that outlines how to capture acronyms in Hindi using Python here

To be Added

  • Croatian
  • Italian
  • Spanish
  • Dutch
  • Greek
  • Hungarian
  • Swedish
  • Portuguese
  • Danish
  • Finnish
  • Russian
  • Polish
  • Ukrainian
  • Romanian
  • Turkish
  • Bavarian
  • Czech
  • Bulgarian
  • Bengali
  • Marathi
  • Telegu
  • Tamil
  • Gujarati
  • Urdu
  • Kannada
  • Odia
  • Malyalam
  • Punjabi
  • Assamese
  • English