-
Notifications
You must be signed in to change notification settings - Fork 0
Statistical implementation
Our latest implementation is based on statistical methods and is available in a number of languages. Data collection can be performed on a Hadoop cluster using our version of PigNLProc. More details on the indexing process of this implementation can be found here and a fully automated indexing tool can be found here.
There are still several open issues with this implementation, see the open issues listed in our Issue tracker.
Q: Can the memory footprint be reduced? A: The memory footprint of this implementation is mainly due to context words, there are three ways to reduce it: 1. use disk-based context instead of memory-based context lookup (see Issue #187) 2. do not consider context (en_small.tar.gz) 3. Prune context data (see Issue #167).
Q: I want to pass a parameter to show more or fewer entities depending on their score. A: See Issue #188
You can also use Spotlight out of the box on a Linux machine by following this guide.
For the memory requirements of the models, see our paper. As the English model is fairly big, en_small.tar.gz
is a low-memory alternative for the English model that does not consider context words and hence will provide lower accuracy.
Project
- Introduction
- Glossary
- User's manual
- Web application
- Installation
- Internationalization
- Licenses
- Researcher
- How to cite
- Support and Feedback
- Troubleshooting
- Team
- Acknowledgements
Statistical backend
Lucene backend
- Introduction
- Downloads
- Architecture
- Internationalization
- Web service parameters / API
- Splitting occurrences into topics
Developers