-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: wait for metrics model to be available locally #3265
Conversation
packages/core/src/ceramic.ts
Outdated
} | ||
|
||
// eslint-disable-next-line no-constant-condition | ||
while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note we will wait forever if the node config says it wants to publish node metrics but the model doesn't ever show up (e.g. because the node is isolated from the rest of the network, or connected to a local network where the Model isn't available)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that is bad we should just continue with our work
there are way too many ways for ceramic nodes to fail, i would strongly prefer NodeMetrics and observability NOT be a way that nodes error out and don't start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's an argument to be made here that if the user asked for metrics in the config and we fail, they should learn that and choose to turn off metrics or to fix the underlying error.
But that said I don't feel strongly and since we plan to turn this on by default, I see the argument to minimize it's impact. I'll update to make this time out without failing startup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we be more forgiving please if we don't find the nodemetrics model
i really don't want to make more error paths for new devs, lets be forgiving in this until we are sure its rock solid and always should succeed in the world
also they might be in a local network without access to the metrics model, or on a new network they made
i would like NOT to error out if node metrics are not working - its not a requirement - we can test it more ourselves to make it work more often
Okay, new version up that waits for up to 10 seconds for the model to become available and then proceeds with startup anyway if the model hasn't been loaded in time. New output if the model is unable to be synced:
|
Increased the timeout to 15 seconds because in my local testing the node was taking more than 10 seconds to sync the model fairly regularly. Interestingly, how long it takes the node to sync the model on an brand new clean node seems to vary quite wildly. I saw it sync within milliseconds, the data already being present by the time js-ceramic tried to load it the first time, and I also saw it take as long as almost 20 seconds at least once. FYI @dav1do @nathanielc |
*/ | ||
async _waitForMetricsModel(model: StreamID): Promise<boolean> { | ||
const maxWaitDuration = 1000 * 15 // 15 seconds | ||
const retryInterval = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think its going to take more than .1 sec? a cascading backoff might be nice. but this is fine for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. the retry interval might be a bit short, an incremental backoff might be better? but this is fine for now, at worst it will just fail to do node metrics and have a 15 sec delay so its fine - thx for the error handling!
Closing in favor of #3268 |
Builds on #3260
When starting up a clean js-ceramic node for the first time with node metrics publishing enabled, the node needs to wait for the NodeMetrics model to be synced via recon and made available locally before it tries to index it. Example out from a clean node as it waits for this to happen below: