Warm up tasks

If you are going to develop with the DBpedia Spotlight team, it is important that you do the "everybody should do" tasks, and at least one of the "would be cool to have" tasks.

Everybody should do

Getting Stared:

Playing with Spotlight demo (http://dbpedia-spotlight.github.io/demo/)
Compile / run it locally
Checking the issue pages
A Generative Entity-Mention Model for Linking Entities with Knowledge Base explains the main ideas behind spotlight
Understanding what data is saved and how stores work. Take a look at: model editor post

Documentation

The best warm up task is to polish the user documentation. Read it up, ask questions, add the answers to these questions to the documentation so that the next guy doesn't have to ask them.

A good starting point would be to update them as you complete warmup tasks.

Write how to solve issues you went through
Rephrase confusing sentences
Gather recurrent questions from Mailing list/Github Issues ( maybe creating a FAQ )

Build from source

Download the latest code, build and try to run the software. Add to the documentation any error messages that you may find, or any information that we may have forgotten to add. Try first to run the server with files offered for download, instead of running the indexing process from scratch, because that can take a while.

Play a bit with the endpoints (check disambiguate, annotate, spot )

Learn Scala, Basic Functional and Solid OO Programming

You will quickly fall in love with the elegance of Scala's functional programming combined with object-oriented programming. We don't need to be the most idiomatic Scala programmers, but you'll quickly notice that some patterns just stick with you. You should invest at least an hour a day during the "community bonding period" to learn Scala. See, for example, the Scala School by the Twitter folks. We learned the language while building DBpedia Spotlight, so you can do it too.

Spark

If you are going to work on the tool generating the models out of a Wikipedia Dump It would be best to get familiar with Spark as well. Spark's youtube channel offers some good material to get a grasp of its concepts.

Check how to set up a Simple project using Spark and SBT
Try some of the examples, run them locally on your IDE
Setup a local spark master and try submitting some of the packaged examples via the submit script

Data in the Models

If you want to peek on how is the data in the models stored, what data is saved in each of the stores have a play with [Model Editor] (https://github.com/idio/spotlight-model-editor).

Would be cool to have

Look for tasks in our issue tracker, assign yourself, if nobody already is, and start discussing/working on the issue.

Here are some additional ideas:

Design some "powered by dbpedia spotlight" logos. Size suggestions: 80x20 (example), 200x80, 140x56, 100x40, and 70x28 (example)
Create/enhance step-by-step instructions to configure dev environment on IntelliJ, Eclipse, TypeSafe Scala-IDE, or whatever is your favourite IDE.
Run indexing for one or two entity types or categories (small data set).
Run indexing for another language besides English (as long as you have working knowledge of that language)

Earn major props if you build test cases for ongoing issues, or others that you may find for yourself:

Other small tasks that we'd be very glad to receive as contributions:

Google Freebase Annotations of TREC KBA 2014 Stream Corpus is a huge annotated corpus with MID's it would be good to:
Play with it
Slice a small portion of it
Write a script which matches MIDs into DbpediaIds and replaces annotations.

DBpedia Spotlight - Shedding Light on the Web of Documents

Home

Project

Statistical backend

Lucene backend

Developers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly