Skip to content

Commit

Permalink
Add more examples in the mission doc
Browse files Browse the repository at this point in the history
  • Loading branch information
JohnGarbutt committed Jan 31, 2024
1 parent eba0c74 commit b6edb68
Showing 1 changed file with 133 additions and 14 deletions.
147 changes: 133 additions & 14 deletions docs/mission/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,7 @@ Coral credits are focused on how to support sharing of resources
using multiple interfaces such as:
Azimuth, OpenStack Blazar and Slurm

## User Experience

### On-boarding Accounts, Projects and Users
## On-boarding Accounts, Projects and Users

We are assuming clouds are trying to follow the
[AARC blueprint](https://aarc-project.eu/architecture/),
Expand All @@ -26,7 +24,7 @@ Coral Credit Accounts are assocaited to a particular group
defined in the central AAAI proxy. This group typically
has access to many different resource providers.

### Resource Class and Resource Class Hours
## Resource Class and Resource Class Hours

A coral credits operator is responsible for defining
the list of available resource classes.
Expand Down Expand Up @@ -57,7 +55,7 @@ The hope is to add this support in a future,
under the assumption any resource consumption
is only from a single credit pool.

### Resource Providers
## Resource Providers

There are places where an account gets to
consume their allocated credits.
Expand All @@ -67,7 +65,7 @@ onboarding a particular resource provider
and giving them a token to access the
resource consumption API.

### Resource Consumption Requests
## Resource Consumption Request

Cloud credits are consumed at a specfiic Resource
Provider. The units are resource class hours.
Expand All @@ -91,18 +89,139 @@ A resource consumtion request has the following properties:
all credits are used or when all credits have
expired

#### Example: Azimuth platform
### Example: Azimuth short lived platform

Azimuth plaforms are now forced to pick an end date,
such that we can make a credit consumption request
for a platform we are about to create.

If there are not enough credits, it will be clear
what credits are required to create the platform,
possbily including which platforms could be
stopped early to free up credits for the requested
platform.

When a platform is stopped before the originially
agreed time, the consmption record should be
updated with the new end date, returning the credits
back to the user.

### Example: Azimuth long lived platform

Where platforms are long lived, the scheduled end
date need to be either when their current credits
expire, or possibly sooner if the proposed
platform will consume all reminaing credits before
those credits expire.

Users need to be warned when platforms are about
to be automatically deleted, so they can get
additional credits allocated.

When credits are allocated "back to back" with no
gap, the user is able to request a change to the
end date for the existing credit consumption
request, and with the option to extend to the
maximun date allowed given the current credit
allocation for the associated account.

### Example: Azimuth variable resource usage

All the platforms so far have assumed a uniform
resource usage throught the lifetime of the
platform.

While not supported in the initial implemention,
we need to support the a variety of increases
and decreases in resource during the lifetime
of the cluster.
We likely need to have the option for resource
consumption requests resource footprint
records to have a start and end date that is
indepent of the overall resource consumption
request.

### Example: OpenStack Blazar reservation

This is very similar to the Azimuth case,
except its for an arbitry reservation via
the Blazar API.

To help reservations line up nicely,
and reduce resource fragmentation,
we could enforce that we round up credits
to the nearer time window (e.g. 1 hour,
or 1 working day).

### Example: Slurm batch job credits

You could have a single pool of credits,
where you could self-service request that
a some amount of Coral Credits are given to
your Slurm account, such that you can submit
some jobs to your chosen Slurm cluster.

### Example: Slurm reservations

Similar to Blazar, you could imagine building the
option to self service Slurm reservations against
a shared resource pool.

### Example: Juypter Hub (idea)

When a user logs into jupyer hub, and their container
is spun up, maybe this could be blocked (using a custom
Authorization plugin or jupyterhub-singleuser wrapper)
if the user doesn't have any credits left,
along side matching configuration in the idle-culling system.

### Example: Seedcorn allocation

One thing not possible with quota, is being
able to hand out a very small amount of resoruce
for people to try things out. You could say
all members of an instituiton automatically get
a seedcorn allocation they could use.
This could become a default allocation amount
for any automatically created accounts.

## Audit logs

TODO
All changes should be recorded in an audit log,
that can be quiried via the API

## Visibility for Account holders

There should be a clear view of:

* all active resource allocations for the account
* all consumers associated with each resource allocation,
so its clear how the credits are being consumed
* A prediction of how many credits will be left
at the end of the allocation

#### Example: OpenStack Blazar reservation
## Prometheus metrics for operators

Various stats should be made availabe via a prometheus
metrics endpoint, including these per account metrics:

TODO
* size of current allocated credits
* size of any not current credits
* remining amount for current active credit allocations
* any active resource consumption records,
including user and account details

#### Example: Slurm job credits
## Periodic reconciliation

TODO
Each resource provider is responsble for regularly checking
if there is any drift between the current resource consumption
requests, and the current state of resoruce consumption records.
Only the service knows how to map the records in coral credits
back to the real resources in that service.

#### Example: Slurm reservations
## No tracking of usage or efficiency

TODO
Coral credits on credit allocations and consumption records
per account, not the current usage in each service.
Coral credis does not track if the resources are being fully
utilized (e.g. job efficieny).

0 comments on commit b6edb68

Please sign in to comment.