Skip to content

Lucene Architecture

Sandro edited this page Jul 26, 2013 · 1 revision

The DBpedia Spotlight Architecture is composed by the following modules:

  • Web application, a demonstration client (HTML/Javascript interface) that allows users to enter/paste text into a Web browser and visualize the resulting annotated text.
  • Web Service, a RESTful/SOAP? Web API that exposes the functionality of annotating and/or disambiguating entities in text.
  • Annotation Java/Scala API, exposing the underlying logic that performs the annotation/disambiguation.
  • Indexing Java/Scala API, executing the data processing necessary to enable the annotation/disambiguation algorithms used.
  • Evaluation module, where we test disambiguators, log results and use those to train our system to perform better.

External dependencies:

  • DBpedia Extraction Framework, (only for the index module) extracting the necessary data from the Wikipedia dumps.
  • Lucene 3.6, providing the low level indexing framework used by DBpedia Spotlight.
  • LingPipe 4.0.0, providing the string matching implementation used for the Spotter module.

System Requirements

  • Java 1.6+
  • Scala 2.9+
  • Spotlight JAR
  • Spotlight Library JARs
  • Lucene disambiguation index
  • Spotter dictionary
  • large RAM to set the heap size big enough for the Spotter (approx. 8G)
  • Maven 3 for the automatic installation of dependencies.
  • Indexing Java/Scala API, executing the data processing necessary to enable the annotation/disambiguation algorithms used.
Clone this wiki locally