A library that provides parsing and normalization of people's names.
require 'nameable'
n = Nameable::Latin.new.parse('Mr. Chris K Horn Esquire')
puts "#{n.prefix} #{n.first} #{n.middle} #{n.last} #{n.suffix}"
#=> Mr. Chris K Horn Esq.
puts n.to_fullname
#=> Mr. Chris K. Horn, Esq.
n = Nameable::Latin.new('CHRIS', 'HORN')
puts n.to_nameable
#=> Chris Horn
n = Nameable::Latin.new(prefix:'Sir', last:'Horn')
puts n
#=> Sir Horn
Convenience methods:
puts Nameable('chris horn, iii')
#=> "Chris Horn, III."
puts Nameable.parse('chris horn, iii')
#=> #<Nameable::Latin:0x007f8470e01b08 @first="Chris", @last="Horn", @middle=nil, @prefix=nil, @suffix="III.">
Using a database of first names from the U.S. Social Security Administration, Nameable will pick the most likely gender for a name.
Nameable::Latin.new('Chris').gender
#=> :male
Nameable::Latin.new('Janine').female?
#=> true
Using a database of last names from the U.S. Census, Nameable will return the ethnicity breakdown as a Hash.
Nameable::Latin.new('Chris', 'Horn').ethnicity
#=> {:rank=>593, :count=>51380, :percent_white=>86.75, :percent_black=>8.31, :percent_asian_pacific_islander=>0.84, :percent_american_indian_alaska_native=>1.16, :percent_two_or_more_races=>1.46, :percent_hispanic=>1.48}
I've included a little web service, which should be installed as "nameable_web_service" that requires sinatra. It's been handy when paired with OpenRefine, if I'm working with a file and I am not going to be parsing with Ruby. If you're reading this, that's probably not an issue for you, but I do think it's a nice way to show someone how to use OpenRefine in a more advanced way.
By inspiration, I should really say "other projects from which I yanked their code, ideas, examples and data." At worst I'll make sure the other projects I looked at and borrowed from are credited here.
As of version 1.1.1
., the nameable gem is cryptographically signed. To be sure the gem you install hasn’t been tampered with, add my public key as a trusted certificate, and verify that nameable and any dependencies it has are also signed:
$ gem cert --add <(curl -Ls https://raw.github.com/chorn/nameable/master/certs/chorn.pem)
$ gem install nameable -P HighSecurity
- Open Refine formerly Google Refine
- Help with splitting names
- First Names from the U.S. SSA
- Last Names from the Census
- Data Science Toolkit
- Addressable
- Extract all of the US Census / Ethnicity / Asset stuff out of
Latin
. Yuck, that's ugly why did I ever do that? - Rename
Latin
to beUS
orEnglish
because it's looks like I really only support English, and probably US English. - Use named captures for all the regexs.
- Refactor the Ethnicity stuff into a class.
- Refactor parsing into a class.
-chorn