Skip to content

Pelias document blacklist stream

Notifications You must be signed in to change notification settings

pelias/blacklist-stream

Repository files navigation

This repository is part of the Pelias project. Pelias is an open-source, open-data geocoder originally sponsored by Mapzen. Our official user documentation is here.

Pelias Blacklist Stream

This package provides a configuration-driven approach to removing specific records from an import stream.

It's particularly helpful where algorithmic deduplication fails or for when data is erroneous and needs to be replaced with an alternative.

Installation

$ npm install pelias-blacklist-stream

NPM

Usage

The blacklist stream is intended to be used by pipelines passing objects generated by pelias/model.

You can specify which records are omitted from the build by providing their globally-unique-id (GID).

GIDs can be found in the results served by pelias/api or by calling getGid() on a Pelias Model object.

Using a javascript map as the blacklist

const blacklistStream = require('pelias-blacklist-stream');
const blacklist = {
  "openaddresses:address:us/tx/libery:377e64dd81884dbe": "1800 Mlk, Liberty, TX, USA",
  "openaddresses:address:us/fl/statewide:bee100ffcc77c699": undefined
};

const stream = blacklistStream( blacklist );

The stream will now remove any documents which match either of the GIDs openaddresses:address:us/tx/libery:377e64dd81884dbe or openaddresses:address:us/fl/statewide:bee100ffcc77c699.

The values are optional, you can specify a human-readable comment for debugging.

Using blacklist files specified from Pelias Config

If no arguments are provided when calling blacklistStream(), it will load your local pelias/config.

If your config contains entries in the imports.blacklist.files array then each file will be loaded from disk, merged and used as the blacklist.

const blacklistStream = require('pelias-blacklist-stream');
const stream = blacklistStream(); // no arguments specified

The relevant parts of the pelias config file, usually located at ~/pelias.json:

{
  "imports": {
    "blacklist": {
      "files": [
         "/tmp/blacklist_file_one",
         "/tmp/blacklist_file_two"
      ]
    }
  }
}

Blacklist file format

Blacklist files stored on disk can have any file extension (or none).

Each line of the file should contain one GID and optionally one comment, lines are separated by a '\n' newline character.

An example of a blacklist file without comments:

openaddresses:address:us/tx/libery:377e64dd81884dbe
openaddresses:address:us/fl/statewide:bee100ffcc77c699

An example of a blacklist file with debugging comments:

openaddresses:address:us/tx/libery:377e64dd81884dbe # 1800 Mlk, Liberty, TX, USA
openaddresses:address:us/fl/statewide:bee100ffcc77c699

If the line contains a '#' symbol then anything after the '#' will be considered a comment. Using another '#' in your comment string is not supported.

The parser will String.trim() whitespace but you must take care to provide the correct letter casing.

NPM Module

The pelias-blacklist-stream npm module can be found here:

https://npmjs.org/package/pelias-blacklist-stream

Contributing

Please fork and pull request against upstream master on a feature branch.

Pretty please; provide unit tests and script fixtures in the test directory.

Running Unit Tests

$ npm test

Continuous Integration

CI tests every release against all supported Node.js versions.

Versioning

We rely on semantic-release and Greenkeeper to maintain our module and dependency versions.

Greenkeeper badge

About

Pelias document blacklist stream

Resources

Code of conduct

Stars

Watchers

Forks

Sponsor this project

Packages

No packages published

Contributors 3

  •  
  •  
  •