Some are created for Natural Language Processing (NLP). Others may be created for better indexing and results for search queries. These will still function as stop words but are selected for a different purpose. Please keep that in mind when you use them. Additionally, each stop words list is created with a certain project in mind. A word the creators might have tagged as uninteresting, may be interesting for you. Thus, we recommend reviewing the stop word list you choose to work with.
Some of the common ways to create your own, if you cannot find one that works for you here, is to find the most commonly used words from the following corpora:
- The Wikipedia pages of the language
- Archived literature in that language
You can also use your own corpus to create a stopword list.
- StopWords by David Muhr, Kenneth Benoit, and Kohei Watanabe relies on the following resources for stopwords:
- The linguistics Department of the University of Potsdam developed these stop words list from Twitter.
- Justus Leibig University via a textual analysis of business documents developed these stop words
Jacques Savoy associated with Université de Neuchâtel for Swiss French developed these stop words
State University of Novi Pazar developed these stop words
Jha, Vandana; N, Manjunath; Shenoy, P Deepa; K R, Venugopal developed these stop words
Student work that outlines how to capture acronyms in Hindi using Python here
- Croatian
- Italian
- Spanish
- Dutch
- Greek
- Hungarian
- Swedish
- Portuguese
- Danish
- Finnish
- Russian
- Polish
- Ukrainian
- Romanian
- Turkish
- Bavarian
- Czech
- Bulgarian
- Bengali
- Marathi
- Telegu
- Tamil
- Gujarati
- Urdu
- Kannada
- Odia
- Malyalam
- Punjabi
- Assamese
- English