From e6751039ca1cc2ad5c09e51533c1ffa7ad55831f Mon Sep 17 00:00:00 2001 From: "David J. Brenes" Date: Thu, 21 Oct 2021 23:05:56 +0200 Subject: [PATCH 1/4] Added changelog for the version --- CHANGELOG | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/CHANGELOG b/CHANGELOG index 45cf8ff..509a8fd 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,3 +1,21 @@ +0.6.0: + * Added support for: [@bettysteger] [#16] + * Afrikaans (af) + * Arabic (ar) + * Bengali (bn) + * Breton (br) + * Catalán (ca) + * Czesch (cs) + * Hebrew (he) + * Indonesian (id) + * Korean (ko) + * Thai (th) + * Turkish (tr) + * Vietnamese (vi) + * Added keywords for English and Deustch [@bettysteger] [#16] + * Fixed upcase/downcase behaviour [@bettysteger] [#16] + * Added gender neutral versions of some words in spanish [@fauno] [#17] + * Bumped rdoc version [@dependabot] [#18] 0.5.0: * Added Greek stopwords based on Lucene [@vrypan] [#13] * Fixed CSV format for sv and ru locales [@woto] [#14] From 8170586a48397c6e2f635a5adbda1190c443115c Mon Sep 17 00:00:00 2001 From: "David J. Brenes" Date: Thu, 21 Oct 2021 23:20:20 +0200 Subject: [PATCH 2/4] Includedcthe list of supported languages in the Readme --- CHANGELOG | 2 +- README.md | 37 +++++++++++++++++++++++++++++++++++++ 2 files changed, 38 insertions(+), 1 deletion(-) diff --git a/CHANGELOG b/CHANGELOG index 509a8fd..7e3dd64 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -12,7 +12,7 @@ * Thai (th) * Turkish (tr) * Vietnamese (vi) - * Added keywords for English and Deustch [@bettysteger] [#16] + * Added keywords for English and German [@bettysteger] [#16] * Fixed upcase/downcase behaviour [@bettysteger] [#16] * Added gender neutral versions of some words in spanish [@fauno] [#17] * Bumped rdoc version [@dependabot] [#18] diff --git a/README.md b/README.md index 9b4eb54..467384e 100644 --- a/README.md +++ b/README.md @@ -107,6 +107,8 @@ That's all? I know what you're thinking, it takes a line of ruby code to filter one array from other. That's why we have added an extra functionality, [Snowball][wikipedia_snowball] stopwords lists, already built for you and ready to use. +At least, in the beginning we were using snowball stopwords, but several collaborators have improved this humble gem by including new languages or adding new stopwords. So now, the Snowball version is more an "Snowball and friends" version. + How do I use that snowball thing? --------------------------------- @@ -122,6 +124,41 @@ And then you filter without worrying about the exact stopwords used filter.filter 'guide by douglas adams'.split #-> ['guide', 'douglas', 'adams'] ``` +Which languages are supported with snowball? +------------------------------------------- + +Currently we have: + + * Afrikaans (af) + * Arabic (ar) + * Bengali (bn) + * Breton (br) + * Catalán (ca) + * Czesch (cs) + * Danish (da) + * German (de) + * Greek (el) + * English (en) + * Spanish (es) + * Finnish (fi) + * French (fr) + * Hebrew (he) + * Hungarian (hu) + * Indonesian (id) + * Italian (it) + * Korean (ko) + * Dutch (nl) + * Polish (pl) + * Portuguese (pt) + * Romanian (ro) + * Russian (ru) + * Swedish (sv) + * Thai (th) + * Turkish (tr) + * Vietnamese (vi) + +In the changelog you can see the collaborators for each language. + Anything else? -------------- From 1b2ffe80599a2a72db0acf2594af49294959c8c8 Mon Sep 17 00:00:00 2001 From: "David J. Brenes" Date: Thu, 21 Oct 2021 23:27:55 +0200 Subject: [PATCH 3/4] More info and thanks in readme and changelog --- CHANGELOG | 1 + README.md | 6 ++++-- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/CHANGELOG b/CHANGELOG index 7e3dd64..02293b5 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -16,6 +16,7 @@ * Fixed upcase/downcase behaviour [@bettysteger] [#16] * Added gender neutral versions of some words in spanish [@fauno] [#17] * Bumped rdoc version [@dependabot] [#18] + * Fixed Finnish locale code from `fn` to `fi` 0.5.0: * Added Greek stopwords based on Lucene [@vrypan] [#13] * Fixed CSV format for sv and ru locales [@woto] [#14] diff --git a/README.md b/README.md index 467384e..5d3c895 100644 --- a/README.md +++ b/README.md @@ -127,7 +127,7 @@ filter.filter 'guide by douglas adams'.split #-> ['guide', 'douglas', 'adams'] Which languages are supported with snowball? ------------------------------------------- -Currently we have: +Currently we have support for: * Afrikaans (af) * Arabic (ar) @@ -140,7 +140,7 @@ Currently we have: * Greek (el) * English (en) * Spanish (es) - * Finnish (fi) + * Finnish (fi): Due to an error it can also be used referring to the `fn` locale * French (fr) * Hebrew (he) * Hungarian (hu) @@ -169,6 +169,8 @@ Ackonowledgments Thanks to @s2gatev who added the `stopword?` method and the sieve class to this gem +Thanks to @bettysteger, @fauno, @vrypan, @woto, @grzegorzblaszczyk, @nerde, @sbeckeriv and @zackxu1 for language support and other features. + [wikipedia_stopwords]: http://en.wikipedia.org/wiki/Stopword [solr]: https://github.com/sunspot/sunspot [sphinx]: https://github.com/freelancing-god/thinking-sphinx From cc8ad1e988e68e61eda4cc3e5e9e792a35ddbba5 Mon Sep 17 00:00:00 2001 From: "David J. Brenes" Date: Thu, 21 Oct 2021 23:39:24 +0200 Subject: [PATCH 4/4] Bumped version Also, included in the gem build all the csv locale files --- VERSION | 2 +- stopwords-filter.gemspec | 26 +++++++------------------- 2 files changed, 8 insertions(+), 20 deletions(-) diff --git a/VERSION b/VERSION index 8f0916f..a918a2a 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.5.0 +0.6.0 diff --git a/stopwords-filter.gemspec b/stopwords-filter.gemspec index d330b73..a22e5dd 100644 --- a/stopwords-filter.gemspec +++ b/stopwords-filter.gemspec @@ -5,11 +5,11 @@ Gem::Specification.new do |s| s.name = %q{stopwords-filter} - s.version = "0.5.0" + s.version = "0.6.0" s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version= s.authors = ["David J. Brenes"] - s.date = %q{2021-06-09} + s.date = %q{2021-10-21} s.description = %q{Small library that allows you to create a simple stopwords filter or use some based on Snowball stopwords lists} s.email = %q{davidjbrenes@gmail.com} s.extra_rdoc_files = [ @@ -17,6 +17,10 @@ Gem::Specification.new do |s| "LICENSE.txt", "README.md" ] + locale_files = [] + Dir.glob("lib/stopwords/snowball/locales/*.csv") do |locale_file| + locale_files << locale_file + end s.files = [ "CHANGELOG", "Gemfile", @@ -28,26 +32,10 @@ Gem::Specification.new do |s| "lib/stopwords/snowball.rb", "lib/stopwords/snowball/filter.rb", "lib/stopwords/snowball/wordsieve.rb", - "lib/stopwords/snowball/locales/bg.csv", - "lib/stopwords/snowball/locales/da.csv", - "lib/stopwords/snowball/locales/de.csv", - "lib/stopwords/snowball/locales/el.csv", - "lib/stopwords/snowball/locales/en.csv", - "lib/stopwords/snowball/locales/es.csv", - "lib/stopwords/snowball/locales/fn.csv", - "lib/stopwords/snowball/locales/fr.csv", - "lib/stopwords/snowball/locales/hu.csv", - "lib/stopwords/snowball/locales/it.csv", - "lib/stopwords/snowball/locales/nl.csv", - "lib/stopwords/snowball/locales/pl.csv", - "lib/stopwords/snowball/locales/pt.csv", - "lib/stopwords/snowball/locales/ro.csv", - "lib/stopwords/snowball/locales/ru.csv", - "lib/stopwords/snowball/locales/sv.csv", "spec/lib/filter_spec.rb", "spec/lib/snowball_filter_spec.rb", "spec/spec_helper.rb" - ] + ] + locale_files s.homepage = %q{http://github.com/brenes/stopwords-filter} s.licenses = ["MIT"] s.require_paths = ["lib"]