Skip to content
This repository has been archived by the owner on Apr 4, 2024. It is now read-only.

Entity Resolution

Steve Mckellar edited this page Apr 13, 2021 · 2 revisions

What is entity resolution?

Entity resolution allows you to find pairs of nodes (and egos) across different sessions that represent the same person, place or object. You can export a single network including these merged nodes, and their resolved properties. This is facilitated by sending a list of nodes to a script (typically python), which then returns a list of pairs with scores of the probability of matching.

Using entity resolution

You will need two things in order to use entity resolution:

A dataset with more than one session.

Entity resolution is used to connect nodes between sessions, for this reason you should start with a dataset of at least 2 sessions.

For this example you should start with the protocol found here:
https://github.com/complexdatacollective/entity-resolution-sample/blob/master/examples/protocols/Simple%20Entity%20Resolution%20Protocol.netcanvas

An entity resolver

An entity resolver receives a list of nodes from the Server app and returns a list of pairs with an associated probability score.

Because a resolver interprets the dataset of a network it is specific to a .netcanvas protocol.

The resolver will receive nodes over stdin, and should return results over stdout. This can happen synchronously: sending results after all nodes are received and processed; or asynchronously: immediately returning results as soon as the first nodes are received, and continuing to process them as they are received.

Server assumes the resolver will be an interpreted script written in python.

For this example you should start with the resolver found here:
https://github.com/complexdatacollective/entity-resolution-sample/blob/master/EntityResolution.py

Walkthrough

Prerequisites:

  • python3 installed (for sample scripts, python2 should work in principle with other scripts)
  • Latest Server 6.1.0 installed (available on slack)
  • Latest Interviewer 6.0.3 installed, and paired with interviewer (available on github)

Resolving sessions

  1. Download the example resolver from https://github.com/complexdatacollective/entity-resolution-sample/
  2. Install the protocol found in /examples/protocols/Simple Entity Resolution Protocol.netcanvas in Server and Interviewer.
  3. Create at least 2 example sessions.
  4. Export those sessions to Server.
  5. In Server, go to the Simple protocol workspace and click the "Resolve data" tab
  6. Go to the "Resolve Sessions" section
  7. Select "Person" as the ego node cast type (this will convert egos into person nodes so that they can be included in the comparison)
  8. Interpreter should be set to the location of the python3 installation on your system. If it's included in your $PATH, you can leave this as just python3.
  9. Set the Resolver Script Path to EntityResolution.py in the sample files
  10. Click Begin Entity Resolution
  11. For each pair you may select a combination of attributes, or 'Not a match'
  12. After you confirm the last match you will be presented with a summary screen, click "Save and Export" (and export will be generated with the default export settings, but with all sessions merged)

Exporting previous resolutions

  1. In Server, go to the Simple protocol workspace and click the "Resolve data" tab
  2. Go to the "Existing resolutions" section
  3. Click "Export" on the resolution you would like to export

Follow up resolutions

If you add new sessions you may wish to also resolve them. Resolutions are cumulative, meaning this feature will attempt to resolve these later sessions with any previous resolutions. These steps assume you have already created previous resolutions by following steps here:
https://github.com/complexdatacollective/Server/wiki/_new#resolving-sessions

  1. Create at least one extra session in Interviewer, and export it to Server.
  2. In Server, go to the Simple protocol workspace and click the "Resolve data" tab
  3. Go to the "Resolve Sessions" section
  4. You will not be able to set the ego cast type (this could cause conflicts with previous resolutions)
  5. Interpreter will be set to the same location as the latest resolution, but can be changed.
  6. The Resolver Script Path will also be set to the same path as the last resolution, and can be changed.
  7. Click Begin Entity Resolution
    The script will receive:
    • Previously resolved nodes from earlier sessions, in their resolved state
    • Nodes from new sessions will be unchanged.
  8. For each pair you may select a combination of attributes, or 'Not a match'
  9. After you confirm the last match you will be presented with a summary screen, click "Save and Export" (and export will be generated with the default export settings, but with all sessions merged)