Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming telemetry #63

Merged
merged 6 commits into from
Nov 17, 2022
Merged

Conversation

noguchiko
Copy link
Contributor

This PR adds the streaming telemetry feature to the Goldstone management framework. It adds:

  • goldstone-telemetry model and its server component xlate-telemetry
  • Subscribe PRC support for the gNMI server (north-gnmi)
  • a data cache mechanism option for the OpenConfig translator (xlate-oc)

I think the cache for xlate-oc is optional. Streaming telemetry subscriptions to OpenConfig model data will introduce lot of read access to operational state data of Goldstone primitive models. The cache mechanism may balances the load.

@ishidawataru
Copy link
Contributor

@noguchiko Is xlate-telemetry translating any model to the goldstone primitive models?
Is the idea behind implementing xlate-telemetry to provide a generic way for north daemons to provide telemetry service?

@noguchiko
Copy link
Contributor Author

@ishidawataru

Is xlate-telemetry translating any model to the goldstone primitive models?

No, it isn't.

Is the idea behind implementing xlate-telemetry to provide a generic way for north daemons to provide telemetry service?

Yes.

There are two types of telemetry: dial-in (dynamic) telemetry and dial-out (persistent) telemetry. And gNMI Subscribe RPC is a kind of dial-in telemetry. xlate-telemetry can also support dial-out telemetry in the future. This is why xlate-telemetry is an independent application instead of a part of north-gnmi.

@ishidawataru
Copy link
Contributor

ishidawataru commented Nov 11, 2022

@noguchiko

No, it isn't.

In that case, how about creating a new kind of daemon instead of treating it as an xlate daemon?
Maybe system daemon?

definition: system daemons provide service to north daemons by only interacting with south or xlate daemons. (no interaction with hardware)
system daemons optionally subscribe to the Goldstone primitive models.

so renaming xlate-telemetry to system-telemetry?

It might be helpful to use system-telemetry for north-snmp. (no need to handle this in this PR)

@noguchiko
Copy link
Contributor Author

@ishidawataru Yes. New system daemons and system-telemetry sound good to me.

Shall I rename xlate-telemetry to system-telemetry in this PR? If so, I'll add the definition of system daemons in the management framework first.

@ishidawataru
Copy link
Contributor

@noguchiko

Shall I rename xlate-telemetry to system-telemetry in this PR? If so, I'll add the definition of system daemons in the management framework first.

Yes, please. Thank you!
Please clean up the git commits so that they look like we named the daemon as system-telemetry from the beginning.

@noguchiko
Copy link
Contributor Author

@ishidawataru

Please clean up the git commits so that they look like we named the daemon as system-telemetry from the beginning.

done

@ishidawataru
Copy link
Contributor

@noguchiko How about curving out the openconfig potion of system-telemetry and put it to xlate-oc?

@noguchiko
Copy link
Contributor Author

@ishidawataru It makes sense. I'll do it.

@noguchiko
Copy link
Contributor Author

@ishidawataru I have moved the openconfig-telemetry service into xlate-oc.

@noguchiko
Copy link
Contributor Author

I added two fix commits 0574679 and 74430a2. I'll squash them before merge.

@ishidawataru
Copy link
Contributor

ishidawataru commented Nov 16, 2022

@noguchiko Thank you. I'm fixing this CI failure. #64 Once that is fixed, I'll work on this PR.

@ishidawataru
Copy link
Contributor

@noguchiko What do you think of adding system-operational-cache to serve the following YANG model?
It provides an operational info-caching mechanism for all internal components.

Will this be useful for system-telemetry?

Currently, north-snmp is periodically fetching the operational information. Having one central cache for operational info can reduce the load on the system.

// goldstone-operational-cache.yang

rpc get-operational-cache {
  input {
    leaf xpath {
      type string;
    }
    leaf acceptable-cache-age {
      ...
    }
  }
  output {
    anydata data {
      ...
    }
    leaf cache-hit {
      type boolean;
    }
    leaf cache-age {
      ...
    }
  }
}

@noguchiko
Copy link
Contributor Author

@ishidawataru The general idea of having single central cache is great and useful for system-telemetry (and other north|xlate|system daemons). But I think we should consider and study some points:

  • Should it get operational states for a cache miss?
    • Yes: It is useful for uses. But get-operational-cache may be called recursively. Can the framework handle it?
      • e.g. system-telemetry -get-operational-cache-> xlate-oc -get-operational-cache-> south-tai
    • No: It should provide set-operational-cache RPC and users call it to save retrieved operational states in the cache.
  • Should it return subtree data specified by xpath? e.g. If data for /openconfig-platform:components is cached, cached data is returned for request with /openconfig-platform:components/component/name
    • Yes: It is useful for uses and reduces load on the system. But, I think, it is difficult. Now we are dealing it by using the central datastore.
    • No: Uses may find data inconsistency.

@ishidawataru
Copy link
Contributor

ishidawataru commented Nov 17, 2022

@noguchiko The changes look good to me. Please rebase on master and clean up the commits.

This change adds a new goldstone primitive model goldstone-telemetry.
The model defines configuration parameters for telemetry subscriptions
and a notification for telemetry updates. The model allows a software
serves it to manage subscriptions and send telemetry updates.
This change adds an application for the streaming telemetry feature.
It allows users to manage telemetry subscriptions and receive
notifications about telemetry updates via the goldstone-telemetry
model.
This change enables the CI pipelien to build and push the streaming
telemetry server container image (system-telemetry).
This change adds support for openconfig-telemetry model to the
OpenConfig translator. Users can retrieve operational state of
dynamic-subscriptions (non-persistent dial-in telemetry).
This change adds openconfig-telemetry and openconfig-telemetry-types
to supported models of the gNMI server.
This change adds a support for the Subscribe RPC to the gNMI server.
It uses the goldstone-telemetry model to provide streaming telemetry
updates to an external telemetry collector. It allows users to
synchronize the states between a device and a controller/collector.
@noguchiko
Copy link
Contributor Author

@ishidawataru Thank you for your review. I have done rebase and squash.

@ishidawataru
Copy link
Contributor

Thank you. Do we need the last commit? If the cache is disabled by default, I don't think we need to include it in this PR.

@noguchiko
Copy link
Contributor Author

@ishidawataru I agree with you. dc72ca0 is not necessary for functionality. It is for performance like #65. May I move the commit to another branch?

@ishidawataru
Copy link
Contributor

@noguchiko Yes, please remove the commit from this PR and open a new draft PR for with the cache commit for reference.

@noguchiko
Copy link
Contributor Author

@ishidawataru done

@ishidawataru ishidawataru merged commit 50f4267 into oopt-goldstone:master Nov 17, 2022
@noguchiko noguchiko deleted the telemetry branch November 17, 2022 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants