CLARIN Federated Content Search v3.0 Aggregator – Augmenting your Search Engine
The CLARIN Federated Content Search (CLARIN-FCS) introduces an interface specification that decouples the search engine functionality from its exploitation, i.e. user-interfaces, third-party applications, to allow services to access heterogeneous search engines in a uniform way.
The Aggregator v3.0 is running at The National Swedish Language Bank's text division as well as at CLARIN.
The Specification for Federated Content Search v2.0 can be found as a PDF document. For more details visit at the CLARIN FCS - Technical Details page.
For a detailed list of changes, please take a look at CHANGELOG.md
.
- A new graphical query builder (GQB) to support the new Query Language FCS-QL
- Support for the AdvancedDataView, ADV, with layer capabilities
- Backwards compatibility to earlier versions of Endpoints and protocols, legacy and version1
The backwards compatibility gives you as a Centre search engine maintainer a smooth transtion to the new features and capabilities at your own convenience.
- A new Query Language
- A matching display of query results, the AdvancedDataView, ADV, with layer capabilities
- Backwards compatibility to earlier versions
These new additions to the CLARIN-FCS will not only enhance the power user experience and possibilities when performing queries from repositories, but also that less experienced users will find it easier to explore different corpora.
- a Client submits a query to an Endpoint
- The Endpoint translates the query from CQL or FCS-QL to the query dialect used by the Search Engine and submits the translated query to the Search Engine
- The Search Engine processes the query and generates a result set, i.e. it compiles a set of hits that match the search criteria.
- The Endpoint then translates the results from the Search Engine-specific result set format to the CLARIN-FCS result format and sends them to the Client.
If you have any kind of RESTful API to your Search Engine using the Korp Endpoint Reference Implementation as a starting point should be the way to go. If you more specifically are using Korp it should only be a simple adaptation to corpora and tagsets needed. In any case do not forget to look at the tests.
To test your Endpoint you can point the IDS Endpoint Tester (code) to your Endpoint.
There is also an Endpoint developer's tutorial available.
To build the FCS Aggregator you need a few simple steps (if you have not changed anything just skip to step 3):
./build.sh --npm
./build.sh --jsx
./build.sh --jar
The frontend (React) and backend (jersey servlet) are then built using node and maven.
Check the aggregator_devel.yml
configuration file. If you want to sideload your enpoint simply
add the endpoint to either additionalCQLEndpoints
or additionalFCSEndpoints
before running:
./build.sh --run
you might also want to change the path to your cache files in AGGREGATOR_FILE_PATH
and AGGREGATOR_FILE_PATH_BACKUP
respectively.
You then can access the locally running Aggregator at http://localhost:4019/
See DEPLOYMENT.md
for example deployment configurations and descriptions about settings.