You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The experience with using the metering infrastructure for HNSciCloud has revealed various limitations in the current implementation. These limitations should be discussed and a roadmap created to plan the evolution of the feature.
Some of the takeaways from the HNSciCloud experience are:
Billable/non-billable usage. Monitored resources may be in states where they are visible, but not (fully) billed. For example, a suspended virtual machine may be billed only for storage but not for CPU or RAM. The states and detailed billing policies probably will vary from provider to provider.
Inaccurate pricing. A single price is used to calculate the cost within a metering record. Consequently, variations for the price of a resource, for example when a VM is suspended, cannot be properly reflected in the calculated cost.
Inaccurate resource usage values. Similarly, the resource usage totals may not be correct when the resource is in an inactive state. For example, the CPU and RAM values should not be included in the totals for a suspended VM.
Workarounds for some of these problems have been added to the UI (e.g. the billable flag). This really goes against the spirit of the metering infrastructure where the client should simply be able to sum values of the metering records to receive correct totals. In general, the client should not need to know about the detailed pricing calculation from the providers.
The current solution produces a large volume of documents within the database. This requires that a retention and/or consolidation policy be put in place (or an acceptance that the storage costs will continuously increase).
Outages of SlipStream or the underlying provides directly affects the accuracy of the metering, as metering records are not produced (or not correctly produced) in these situations. "Backfilling" is possible, but this has never been done in practice.
Tying resource usage to a particular user, group, or role has been done with ad hoc changes to the system. A general mechanism for doing this needs to be developed.
The monitoring system collects usage information through active probing of the cloud resources. This works reasonably well for virtual machines, even through this puts a large load on the job execution framework. This does not work for S3 resources as collecting bucket size information requires scanning all objects within the bucket. The latency is too large to be useful. For S3, exclusive use of ExternalObject resources would avoid these issues, but that requires buy-in from users and avoiding direct use of the underlying S3 cloud services/APIs. Other resources may have similar problems.
The text was updated successfully, but these errors were encountered:
The experience with using the metering infrastructure for HNSciCloud has revealed various limitations in the current implementation. These limitations should be discussed and a roadmap created to plan the evolution of the feature.
Some of the takeaways from the HNSciCloud experience are:
The text was updated successfully, but these errors were encountered: