Contributing CharacterSubstituters

Glad you’d like to add a character substituter to Picky!

What is it?

A character substituter is what you use to normalize single characters. For example, you’d like your indexer or query to convert umlauts into the non-umlaut version:
ü -> ue (Yes, a single character can be normalized into multiple ones)
This is already built in, and you can use it as follows:

  # For indexing:
  default_indexing substitutes_characters_with: CharacterSubstituters::WestEuropean.new
  
  # For querying:
  default_querying substitutes_characters_with: CharacterSubstituters::WestEuropean.new

However, the west european character substitution just changes ö into oe, and ç into c, and similar. For example, there is no conversion defined for polish or russian characters.

So you might want to do your own. How to do it? It’s easy.

How to do it

Check the available character substituters if the substituter already exists.
If yes, use (and improve) it :)
If not, fork the repository and follow the instructions below.

Every character substituter should implement the substitute(text) method. This method is called by the indexer and/or query.

substitute(text) # Substitute characters in the text and return a new text.

Example

This is how the west european character substituter implements the substitute method (at the time of this writing). See also the spec to see what it does.

  def substitute text
    trans = @chars.new(text).normalize(:kd)
    
    # substitute special cases
    #
    trans.gsub!('ß', 'ss')
    
    # substitute umlauts (of A,O,U,a,o,u)
    #
    trans.gsub!(/([AOUaou])\314\210/u, '\1e')
    
    # get rid of ecutes, graves and …
    #
    trans.unpack('U*').select { |cp|
      cp < 0x0300 || cp > 0x035F
    }.pack('U*')
  end

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing CharacterSubstituters

What is it?

How to do it

Example

Clone this wiki locally