diff --git a/docs/mission/index.md b/docs/mission/index.md index 5d3e61a..8261ffe 100644 --- a/docs/mission/index.md +++ b/docs/mission/index.md @@ -8,9 +8,7 @@ Coral credits are focused on how to support sharing of resources using multiple interfaces such as: Azimuth, OpenStack Blazar and Slurm -## User Experience - -### On-boarding Accounts, Projects and Users +## On-boarding Accounts, Projects and Users We are assuming clouds are trying to follow the [AARC blueprint](https://aarc-project.eu/architecture/), @@ -26,7 +24,7 @@ Coral Credit Accounts are assocaited to a particular group defined in the central AAAI proxy. This group typically has access to many different resource providers. -### Resource Class and Resource Class Hours +## Resource Class and Resource Class Hours A coral credits operator is responsible for defining the list of available resource classes. @@ -57,7 +55,7 @@ The hope is to add this support in a future, under the assumption any resource consumption is only from a single credit pool. -### Resource Providers +## Resource Providers There are places where an account gets to consume their allocated credits. @@ -67,7 +65,7 @@ onboarding a particular resource provider and giving them a token to access the resource consumption API. -### Resource Consumption Requests +## Resource Consumption Request Cloud credits are consumed at a specfiic Resource Provider. The units are resource class hours. @@ -91,18 +89,139 @@ A resource consumtion request has the following properties: all credits are used or when all credits have expired -#### Example: Azimuth platform +### Example: Azimuth short lived platform + +Azimuth plaforms are now forced to pick an end date, +such that we can make a credit consumption request +for a platform we are about to create. + +If there are not enough credits, it will be clear +what credits are required to create the platform, +possbily including which platforms could be +stopped early to free up credits for the requested +platform. + +When a platform is stopped before the originially +agreed time, the consmption record should be +updated with the new end date, returning the credits +back to the user. + +### Example: Azimuth long lived platform + +Where platforms are long lived, the scheduled end +date need to be either when their current credits +expire, or possibly sooner if the proposed +platform will consume all reminaing credits before +those credits expire. + +Users need to be warned when platforms are about +to be automatically deleted, so they can get +additional credits allocated. + +When credits are allocated "back to back" with no +gap, the user is able to request a change to the +end date for the existing credit consumption +request, and with the option to extend to the +maximun date allowed given the current credit +allocation for the associated account. + +### Example: Azimuth variable resource usage + +All the platforms so far have assumed a uniform +resource usage throught the lifetime of the +platform. + +While not supported in the initial implemention, +we need to support the a variety of increases +and decreases in resource during the lifetime +of the cluster. +We likely need to have the option for resource +consumption requests resource footprint +records to have a start and end date that is +indepent of the overall resource consumption +request. + +### Example: OpenStack Blazar reservation + +This is very similar to the Azimuth case, +except its for an arbitry reservation via +the Blazar API. + +To help reservations line up nicely, +and reduce resource fragmentation, +we could enforce that we round up credits +to the nearer time window (e.g. 1 hour, +or 1 working day). + +### Example: Slurm batch job credits + +You could have a single pool of credits, +where you could self-service request that +a some amount of Coral Credits are given to +your Slurm account, such that you can submit +some jobs to your chosen Slurm cluster. + +### Example: Slurm reservations + +Similar to Blazar, you could imagine building the +option to self service Slurm reservations against +a shared resource pool. + +### Example: Juypter Hub (idea) + +When a user logs into jupyer hub, and their container +is spun up, maybe this could be blocked (using a custom +Authorization plugin or jupyterhub-singleuser wrapper) +if the user doesn't have any credits left, +along side matching configuration in the idle-culling system. + +### Example: Seedcorn allocation + +One thing not possible with quota, is being +able to hand out a very small amount of resoruce +for people to try things out. You could say +all members of an instituiton automatically get +a seedcorn allocation they could use. +This could become a default allocation amount +for any automatically created accounts. + +## Audit logs -TODO +All changes should be recorded in an audit log, +that can be quiried via the API + +## Visibility for Account holders + +There should be a clear view of: + +* all active resource allocations for the account +* all consumers associated with each resource allocation, + so its clear how the credits are being consumed +* A prediction of how many credits will be left + at the end of the allocation -#### Example: OpenStack Blazar reservation +## Prometheus metrics for operators + +Various stats should be made availabe via a prometheus +metrics endpoint, including these per account metrics: -TODO +* size of current allocated credits +* size of any not current credits +* remining amount for current active credit allocations +* any active resource consumption records, + including user and account details -#### Example: Slurm job credits +## Periodic reconciliation -TODO +Each resource provider is responsble for regularly checking +if there is any drift between the current resource consumption +requests, and the current state of resoruce consumption records. +Only the service knows how to map the records in coral credits +back to the real resources in that service. -#### Example: Slurm reservations +## No tracking of usage or efficiency -TODO +Coral credits on credit allocations and consumption records +per account, not the current usage in each service. +Coral credis does not track if the resources are being fully +utilized (e.g. job efficieny).