Merge pull request #256 from grycap/dev-gmolto

Update script to prevent overwrite and Documentation Improvements
grycap · Oct 24, 2024 · 7316c7c · 7316c7c
2 parents 81e92d4 + a1fc547
commit 7316c7c
Show file tree

Hide file tree

Showing 8 changed files with 44 additions and 11 deletions.
diff --git a/docs/api.md b/docs/api.md
@@ -4,4 +4,8 @@ OSCAR exposes a secure REST API available at the Kubernetes master's node IP
 through an Ingress Controller. This API has been described following the
 [OpenAPI Specification](https://www.openapis.org/) and it is available below.
 
+> ℹ️
+>
+> The bearer token used to run a service can be either the OSCAR [service access token](invoking-sync.md#service-access-tokens) or the [user's Access Token](integration-egi.md#obtaining-an-access-token) if the OSCAR cluster is integrated with EGI Check-in.
+
 !!swagger api.yaml!!
diff --git a/docs/images/oidc/egi-checkin-token-portal.png b/docs/images/oidc/egi-checkin-token-portal.png
diff --git a/docs/integration-egi.md b/docs/integration-egi.md
@@ -67,7 +67,7 @@ grant access for all users from that VO.
 
 The static web interface of OSCAR has been integrated with EGI Check-in and
 published in [ui.oscar.grycap.net](https://ui.oscar.grycap.net) to facilitate
-the authorization of users. To login through EGI Checkín using OIDC tokens,
+the authorization of users. To login through EGI Check-In using OIDC tokens,
 users only have to put the endpoint of its OSCAR cluster and click on the
 "EGI CHECK-IN" button.
 
@@ -87,3 +87,17 @@ create a new account configuration for the
 After that, clusters can be
 added with the command [`oscar-cli cluster add`](oscar-cli.md#add) specifying
 the oidc-agent account name with the `--oidc-account-name` flag.
+
+### Obtaining an Access Token
+
+Once logged in via EGI Check-In you can obtain an Access Token with one of this approaches:
+
+* From the command-line, using `oidc-agent` with the following command:
+
+    ```sh
+    oidc-token <account-short-name>
+    ```
+    where `account-short-name` is the name of your account configuration.
+* From the EGI Check-In Token Portal: [https://aai.egi.eu/token](https://aai.egi.eu/token)
+
+![egi-checkin-token-portal.png](images/oidc/egi-checkin-token-portal.png)
diff --git a/docs/invoking-async.md b/docs/invoking-async.md
@@ -2,11 +2,11 @@
 
 For event-driven file processing, OSCAR automatically manages the creation
 and [notification system](https://docs.min.io/minio/baremetal/monitoring/bucket-notifications/bucket-notifications.html#minio-bucket-notifications)
-of MinIO buckets in order to allow the event-driven invocation of services
-using asynchronous requests, generating a Kubernetes job for every file to be
-processed.
-
+of MinIO buckets. This allow the event-driven invocation of services
+using asynchronous requests for every file uploaded to the bucket, which generates a Kubernetes job for every file to be processed.
 
 ![oscar-async.png](images/oscar-async.png)
 
+These jobs will be queued up in the Kubernetes scheduler and will be processed whenever there are resources available. If OSCAR cluster has been deployed as an elastic Kubernetes cluster (see [Deployment with IM](https://docs.oscar.grycap.net/deploy-im-dashboard/)), then new Virtual Machines will be provisioned (up to the maximum number of nodes defined) in the underlying Cloud platform and seamlessly integrated in the Kubernetes clusters to proceed with the execution of jobs. These nodes will be terminated as the worload is reduced. Notice that the output files can be stores in MinIO or in any other storage back-end supported by the [FaaS supervisor](oscar-service.md#faas-supervisor). 
 
+If you want to process a large number of data files, consider using [OSCAR Batch](https://github.com/grycap/oscar-batch), a tool designed to perform batch-based processing in OSCAR clusters. It includes a coordinator tool where the user provides a MinIO bucket containing files for processing. This service calculates the optimal number of parallel service invocations that can be accommodated within the cluster, according to its current status, and distributes the image processing workload accordingly among the service invocations. This is mainly intended to process large amounts of files, for example, historical data.
diff --git a/docs/invoking-sync.md b/docs/invoking-sync.md
@@ -83,8 +83,8 @@ base64 input.png | curl -X POST -H "Authorization: Bearer <TOKEN>" \
 
 ## Service access tokens
 
-As detailed in the [API specification](api.md), invocation paths require the
-service access token in the request header for authentication. Service access
+As detailed in the [API specification](api.md), invocation paths require either the
+service access token or the Access Token of the user when the cluster is integrated with EGI Check-in, in the request header for authentication (any of them is valid). Service access
 tokens are auto-generated in service creation and update, and MinIO eventing
 system is automatically configured to use them for event-driven file
 processing. Tokens can be obtained through the API, using the

diff --git a/docs/invoking.md b/docs/invoking.md
@@ -2,7 +2,16 @@
 
 OSCAR services can be executed:
 
-  - [Synchronously](invoking-sync.md), so that the invocation to the service blocks the client until the response is obtained. Useful for short-lived service invocations.
+  - [Synchronously](invoking-sync.md), so that the invocation to the service blocks the client until the response is obtained. 
   - [Asynchronously](invoking-async.md), typically in response to a file upload to MinIO or via the OSCAR API.
-  - As an [exposed service](exposed-services.md), where the application executed already provides its own API or user interface (e.g. a Jupyter Notebook)
+  - As an [exposed service](exposed-services.md), where the application executed already provides its own API or user interface (e.g. Jupyter Notebook)
+
+
+After reading the different service execution types, take into account the following considerations to better decide the most appropriate execution type for your use case:
+
+* **Scalability**: Asynchronous invocations provide the best throughput when dealing with multiple concurrent data processing requests, since these are processed by independent jobs which are managed by the Kubernetes scheduler. A two-level elasticity approach is used (increase in the number of pods and increase in the number of Virtual Machines, if the OSCAR cluster was configured to be elastic). This is the recommended approach when each processing request exceeds the order of tens of seconds. 
+
+* **Reduced Latency** Synchronous invocations are oriented for short-lived (< tens of seconds) bursty requests. A certain number of containers can be configured to be kept alive to avoid the performance penalty of spawning new ones while providing an upper bound limit (see [`min_scale` and `max_scale` in the FDL](fdl.md#synchronoussettings), at the expense of always consuming resources in the OSCAR cluster. If the processing file is in the order of several MBytes it may not fit in the payload of the HTTP request.
+
+* **Easy Access** For services that provide their own user interface or their own API, exposed services provide the ability to execute them in OSCAR and benefit for an auto-scaled configuration in case they are [stateless](https://en.wikipedia.org/wiki/Service_statelessness_principle). This way, users can directly access the service using its well-known interfaces by the users. 
 
diff --git a/docs/oscar-service.md b/docs/oscar-service.md
@@ -15,7 +15,7 @@ is in charge of:
 
 
 
-### Input/Output
+### FaaS Supervisor
 
 [FaaS Supervisor](https://github.com/grycap/faas-supervisor), the component in
 charge of managing the input and output of services, allows JSON or base64
@@ -37,6 +37,12 @@ The output of synchronous invocations will depend on the application itself:
 
 This way users can adapt OSCAR's services to their own needs.
 
+The FaaS Supervisor supports the following storage back-ends:
+* [MinIO](https://min.io)
+* [Amazon S3](https://aws.amazon.com/s3/)
+* Webdav (and, therefore, [dCache](https://dcache.org))
+* Onedata (and, therefore, [EGI DataHub](https://www.egi.eu/service/datahub/))
+
 ### Container images
 
 Container images on asynchronous services use the tag `imagePullPolicy: Always`, which means that Kubernetes will check for the image digest on the image registry and download it if it is not present.

diff --git a/examples/plants-classification-tensorflow/script.sh b/examples/plants-classification-tensorflow/script.sh
@@ -1,6 +1,6 @@
 #!/bin/bash
 
-IMAGE_NAME=`basename "$INPUT_FILE_PATH"`
+IMAGE_NAME=`basename "$INPUT_FILE_PATH" | cut -d. -f1`
 OUTPUT_FILE="$TMP_OUTPUT_DIR/output.json"
 
 deepaas-cli predict --files "$INPUT_FILE_PATH" 2>&1 | grep -Po '{.*}' > "$OUTPUT_FILE"