Skip to content

Commit

Permalink
add documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
olinux committed Oct 25, 2022
1 parent 35931a1 commit 125e271
Show file tree
Hide file tree
Showing 3 changed files with 103 additions and 0 deletions.
44 changes: 44 additions & 0 deletions components_and_dependencies.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Components and dependencies

The KG Search is split into a user interface as well as two different ways of how to get the underlying metadata according to the use-case:

## Index-based search

```mermaid
graph LR
INDEXING_API[kg-indexing] --> ES[ElasticSearch]
INDEXING_API -- reading metadata --> KG_CORE[KG Core API]
style KG_CORE fill:lightgrey,stroke:#333,stroke-width:2px
```
The default is the index based search. Here, the kg-indexing service reads the metadata from the KG Core API, translates it and indices it in a ElasticSearch index. This can be done for different levels such as "released" and "in progress".

The kg-indexing service provides an API to trigger different levels of the indexing mechanism. For EBRAINS, these endpoints are triggered by a scheduled automation job.

```mermaid
graph LR
REV_PROXY[search.kg.ebrains.eu] -->|Reverse proxy| UI(KG Search UI)
UI --> KG_SEARCH_API[kg-search]
KG_SEARCH_API --> ES[Elasticsearch]
```
As soon as the index is available, the user-interface can request the precomputed, denormalized metadata from the ElasticSearch instance and apply access restrictions based on user account roles (e.g. to restrict the "in progress" indices to curators only).

### Scalabilty
Please note that the "index-based search" can easily be scaled horizontally either by replication of the indices or clustering of the ElasticSearch database.

## Live search
To allow review mechanism and to simplify development, the KG Search also knows a "live mode". When querying instances on the KG Search in live mode, the underlying data structures
are not consumed from the ElasticSearch instance but rather directly from the KG Core API which makes it "live" (no delay). It also profits from the permission management of the KG Core
since the requesting user will only be able to access those resources which are available from the specific account.


Due to the fact that some representations in KG Search are rather complex and transitioning many levels of the graph, this mode can suffer from some performance penalties compared to the "index-based search".

```mermaid
graph LR
REV_PROXY[search.kg.ebrains.eu] -->|Reverse proxy| UI(KG Search UI)
UI --> KG_SEARCH_API[kg-search]
UI --> KEYCLOAK[Keycloak]
KG_SEARCH_API -- live view --> KG_CORE[KG Core API]
style KEYCLOAK fill:lightgrey,stroke:#333,stroke-width:2px
style KG_CORE fill:lightgrey,stroke:#333,stroke-width:2px
```
34 changes: 34 additions & 0 deletions service/services/kg-indexing/kg-indexing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Indexing
The indexing mechanism makes sure that the aggregation of the metadata read from the KG as well as its denormalization takes place
and that the underlying ElasticSearch indices are populated accordingly.

## Multi-identifier support
Sometimes, instances have multiple identifiers they should be made accessible with. Although the default id is the one provided by the EBRAINS KG,
it is possible to define multiple identifiers as part of the translation process of an instance to make sure the underlying card is available by the use of all of them

## "Searchable" vs. "Non-Searchable" indices
The KG Search knows two main types of indices: "Searchable" and "Non-Searchable". The search requests are only operating on the "searchable" indices. This means,
that if only the newest version of an instance should appear as a search result whilst the others will be available either by id or by navigation on the result cards
only, the indexing-service has to ensure that only the newest instances are registered in the "searchable".

## "Autorelease" vs. "Non-autorelease" indices
> Disclaimer: The naming of "autorelease" vs. "non-autorelease" is there for legacy reasons and can be slightly misleading. We're aiming at renaming it at one point.
The only real impact the separation has (today) is to separate those indices which are expensive to generate and therefore
have their individual endpoints in the *kg-indexing* API. For EBRAINS, this means that we're maintaining e.g. the "File" representation
in an "autorelease" index which then can be scheduled independently from the "non-autorelease".

## Incremental update vs rebuild of indices
The KG Search knows two modes of updating an index. Incremental means that it updates instances individually "on the fly" without any impact on the end-user. "Rebuild" means that the index is recreated from scratch. Since the index is rebuilt in the background and then replacing the old index in one go, downtime is minimal but still potentially noticeable by the end-user. Please note, that a rebuild is required to update e.g. the ElasticSearch mappings of an index (see below).

As a consequence, it's recommendable to use the "incremental update" for regular updates for productive and running instances unless there is the need of a full rebuild which usually is triggered manually.

## Autogeneration of ES-mappings, UI-settings and sitemap by annotation
Alongside the pure data, other resources are required to make the KG Search work:
- ElasticSearch requires mapping tables for proper indexing. These mappings are autogenerated by the KG Search service based on the model annotations of the target models.
- The UI needs additional directives to properly visualize and layout the metadata (e.g. where in the screen an item should be shown, which widget to use for visualization, etc.)
- An automated generation of a sitemap for search engines is also part of the KG Search Service allowing to optimize the appearance in search engines.

## Multi-run indexing
Please note that some features require multiple runs of the indexing process - e.g. the removal of "dead links" (internal links to non-existing resources) relies on the information of a previous run of the indexing mechanism. For the index to settle, it is therefore required to run at least twice indexing runs in sequence.

25 changes: 25 additions & 0 deletions setting_up_the_dev_environment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Setting up the development environment

To set up the development environment, you need to have the following elements:

- An elasticsearch instance (running either on your local machine, on a server of your choice - maybe with SSH port-forwarding)
- A development environment for Java (Spring Boot) and JavaScript (React)


## Running the services
The services are located at `service/services`. You can find two different maven projects you should be able to integrate
into your IDE. Additionally, there is a `kg-common` library in the `service/libs` directory. The two main projects are
stand-alone Spring Boot services and therefore can run alongside each other.

Please note, that - depending on what you want to do - you don't need both services to be running. If you want to work on the UI, it is often
enough to only run the kg-search service (unless you want to improve / debug the actual indexing process).


## Running the UI
To run the UI, you can simply execute
```shell
npm install
npm run start
```
to launch the UI in a development server. If you want to update the configuration (e.g. different server endpoint) please have
a look at the `setupProxy.js` in the `src` directory.

0 comments on commit 125e271

Please sign in to comment.