add documentation

HumanBrainProject · Oct 25, 2022 · 125e271 · 125e271
1 parent 35931a1
commit 125e271
Show file tree

Hide file tree

Showing 3 changed files with 103 additions and 0 deletions.
diff --git a/components_and_dependencies.md b/components_and_dependencies.md
@@ -0,0 +1,44 @@
+# Components and dependencies
+
+The KG Search is split into a user interface as well as two different ways of how to get the underlying metadata according to the use-case:
+
+## Index-based search
+
+```mermaid
+graph LR
+    INDEXING_API[kg-indexing] --> ES[ElasticSearch]
+    INDEXING_API -- reading metadata --> KG_CORE[KG Core API] 
+    style KG_CORE fill:lightgrey,stroke:#333,stroke-width:2px
+```
+The default is the index based search. Here, the kg-indexing service reads the metadata from the KG Core API, translates it and indices it in a ElasticSearch index. This can be done for different levels such as "released" and "in progress".
+
+The kg-indexing service provides an API to trigger different levels of the indexing mechanism. For EBRAINS, these endpoints are triggered by a scheduled automation job.
+
+```mermaid
+graph LR
+    REV_PROXY[search.kg.ebrains.eu] -->|Reverse proxy| UI(KG Search UI)
+    UI --> KG_SEARCH_API[kg-search]
+    KG_SEARCH_API --> ES[Elasticsearch]
+```
+As soon as the index is available, the user-interface can request the precomputed, denormalized metadata from the ElasticSearch instance and apply access restrictions based on user account roles (e.g. to restrict the "in progress" indices to curators only).
+
+### Scalabilty
+Please note that the "index-based search" can easily be scaled horizontally either by replication of the indices or clustering of the ElasticSearch database.
+
+## Live search
+To allow review mechanism and to simplify development, the KG Search also knows a "live mode". When querying instances on the KG Search in live mode, the underlying data structures
+are not consumed from the ElasticSearch instance but rather directly from the KG Core API which makes it "live" (no delay). It also profits from the permission management of the KG Core
+since the requesting user will only be able to access those resources which are available from the specific account. 
+
+
+Due to the fact that some representations in KG Search are rather complex and transitioning many levels of the graph, this mode can suffer from some performance penalties compared to the "index-based search".
+
+```mermaid
+graph LR
+    REV_PROXY[search.kg.ebrains.eu] -->|Reverse proxy| UI(KG Search UI)
+    UI --> KG_SEARCH_API[kg-search]
+    UI --> KEYCLOAK[Keycloak]
+    KG_SEARCH_API -- live view --> KG_CORE[KG Core API]
+    style KEYCLOAK fill:lightgrey,stroke:#333,stroke-width:2px
+    style KG_CORE fill:lightgrey,stroke:#333,stroke-width:2px
+```
diff --git a/service/services/kg-indexing/kg-indexing.md b/service/services/kg-indexing/kg-indexing.md
@@ -0,0 +1,34 @@
+# Indexing
+The indexing mechanism makes sure that the aggregation of the metadata read from the KG as well as its denormalization takes place
+and that the underlying ElasticSearch indices are populated accordingly.
+
+## Multi-identifier support
+Sometimes, instances have multiple identifiers they should be made accessible with. Although the default id is the one provided by the EBRAINS KG,
+it is possible to define multiple identifiers as part of the translation process of an instance to make sure the underlying card is available by the use of all of them
+
+## "Searchable" vs. "Non-Searchable" indices
+The KG Search knows two main types of indices: "Searchable" and "Non-Searchable". The search requests are only operating on the "searchable" indices. This means,
+that if only the newest version of an instance should appear as a search result whilst the others will be available either by id or by navigation on the result cards
+only, the indexing-service has to ensure that only the newest instances are registered in the "searchable".
+
+## "Autorelease" vs. "Non-autorelease" indices
+> Disclaimer: The naming of "autorelease" vs. "non-autorelease" is there for legacy reasons and can be slightly misleading. We're aiming at renaming it at one point.
+
+The only real impact the separation has (today) is to separate those indices which are expensive to generate and therefore 
+have their individual endpoints in the *kg-indexing* API. For EBRAINS, this means that we're maintaining e.g. the "File" representation
+in an "autorelease" index which then can be scheduled independently from the "non-autorelease".
+
+## Incremental update vs rebuild of indices
+The KG Search knows two modes of updating an index. Incremental means that it updates instances individually "on the fly" without any impact on the end-user. "Rebuild" means that the index is recreated from scratch. Since the index is rebuilt in the background and then replacing the old index in one go, downtime is minimal but still potentially noticeable by the end-user. Please note, that a rebuild is required to update e.g. the ElasticSearch mappings of an index (see below).
+
+As a consequence, it's recommendable to use the "incremental update" for regular updates for productive and running instances unless there is the need of a full rebuild which usually is triggered manually.
+
+## Autogeneration of ES-mappings, UI-settings and sitemap by annotation
+Alongside the pure data, other resources are required to make the KG Search work:
+- ElasticSearch requires mapping tables for proper indexing. These mappings are autogenerated by the KG Search service based on the model annotations of the target models.
+- The UI needs additional directives to properly visualize and layout the metadata (e.g. where in the screen an item should be shown, which widget to use for visualization, etc.)
+- An automated generation of a sitemap for search engines is also part of the KG Search Service allowing to optimize the appearance in search engines.
+
+## Multi-run indexing
+Please note that some features require multiple runs of the indexing process - e.g. the removal of "dead links" (internal links to non-existing resources) relies on the information of a previous run of the indexing mechanism. For the index to settle, it is therefore required to run at least twice indexing runs in sequence.
+
diff --git a/setting_up_the_dev_environment.md b/setting_up_the_dev_environment.md
@@ -0,0 +1,25 @@
+# Setting up the development environment
+
+To set up the development environment, you need to have the following elements:
+
+- An elasticsearch instance (running either on your local machine, on a server of your choice - maybe with SSH port-forwarding)
+- A development environment for Java (Spring Boot) and JavaScript (React)
+
+
+## Running the services
+The services are located at `service/services`. You can find two different maven projects you should be able to integrate 
+into your IDE. Additionally, there is a `kg-common` library in the `service/libs` directory. The two main projects are 
+stand-alone Spring Boot services and therefore can run alongside each other.
+
+Please note, that - depending on what you want to do - you don't need both services to be running. If you want to work on the UI, it is often
+enough to only run the kg-search service (unless you want to improve / debug the actual indexing process).
+
+
+## Running the UI
+To run the UI, you can simply execute
+```shell
+npm install
+npm run start
+```
+to launch the UI in a development server. If you want to update the configuration (e.g. different server endpoint) please have
+a look at the `setupProxy.js` in the `src` directory.