namestand
is a Python library for easily transforming/standardizing lists of names (and other strings). No magic here, just a collection of useful tools.
namestand
was developed with unwieldy database column–names in mind, but can by applied to any list of strings. Other uses might include: standardizing political donor names, normalizing survey responses, et cetera.
pip install namestand
namestand
comes with a set* of broadly useful converters.
*Right now, just two of 'em. Contributions and suggestions welcome.
Suggested usage: column names, form-response options, etc.
Steps:
- Lowercases the string
- Strips any leading and trailing whitespace
- Converts any substring of non-ASCII alphanumeric characters to an underscore
- Removes any leading and trailing underscores
- Prefixes the string with "_" if it starts with a digit (which can otherwise cause trouble with
pandas
and other libraries). E.g., "2013 Happiness" becomes "_2013_happiness".
Example:
namestand.downscore("Case Number") == "case_number"
namestand.downscore([
"Case Number",
"Case #",
"Is Super-Duper?"
]) == [
"case_number",
"case",
"is_super_duper"
]
Suggested usage: Donor names, etc.; note, though, that this converter does not have any special knowledge of the world, e.g., that "Riccchard" is likely a misspelling of "Richard".
Steps:
- Uppercases the string
- Strips any leading and trailing whitespace
- Flips the "first" and "last" names if a comma is present
- Removes the following characters that aren't either (unicode) letters,
'
,-
, or spaces.
Along the way, it tries to gracefully handle name prefixes (Mr./Mrs./etc.) and suffixes (Jr./Sr./VII/Esq./etc.).
Example:
namestand.person_basic("Antony, Mark") == "MARK ANTONY"
namestand.person_basic([
u"Diego Velázquez-O'Connor",
"Antony, Mark"
]) == [
u"DIEGO VELÁZQUEZ-O'CONNOR",
"MARK ANTONY"
]
Tries to remove common cruft from company names.
Steps:
- Uppercases the string
- Strips any leading and trailing whitespace
- Removes the following characters that aren't either (unicode) letters,
'
,-
, or spaces. - Removes "LLC", "LTD", and "INC"
Example:
namestand.person_basic("American Banana Stand, Inc.") == "AMERICAN BANANA STAND"
You can easily build your own name-standardizing pipelines using the following tools.
This function accepts a list of transformers (i.e., functions that accept a string and return a string) and returns a pipeline (i.e., a function that can be used in the same way as the pre-built converters). Converters themselves can be used as parts of pipelines, too. For example, if you wanted to change the downscore
method to use hyphens, instead:
downhyphen = namestand.combine([
namestand.downscore,
lambda x: x.replace("_", "-")
])
But namestand
already comes with a few helpers for doing things like string replacements. So you could also do:
downhyphen = namestand.combine([
namestand.downscore,
namestand.translator("_", "-")
])
Some helpful transformers:
-
namestand.translator(pattern, replacement)
:pattern
can be a string or a compiled regex. Equivalent to an argument-aware combination oflambda x: x.replace(string, replacement)
andlambda x: re.sub(regex, replacement)
. -
namestand.swapper(pattern, replacement)
:pattern
can be a string or a compiled regex. If a given name matches the pattern (re.match
for compiled regexes,x in pattern
for string-pattern
s), the entire name is replaced with the replacement. Otherwise, the given name is retained. -
namestand.stripper(chars_to_strip)
: Equivalent tolambda x: x.strip(chars_to_strip)
-
namestand.defaulter(test, default_value)
:test
can be either a list of "approved" values, or a function that returns True or False. Ifx
doesn't pass the test (or isn't in the list), it is replaced withdefault_value
.
Additional usage examples can be found in test/. To test, run nosetests
or tox
from this repo's root directory. Currently tested, and passing, on the following Python versions:
2.7.14
3.5.4
3.6.4
3.7.5
3.8.0
Pull requests, suggestions, etc. welcome.