You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the operations for zk are not very frequently, one is that css workers update status, another is the create/delete operations when register/unregister shuffle. Currently we have 7 zk nodes for a CSS cluster which has hundreds of workers.
The zk pressure(memory) mainly comes from lots of zk watches, which we used for tracking the shuffleId lifetime to clean data on css workers. We are doing the optimization for this.
@bdyx123 , have you seen the following exceptions from Spark application logs? It seems CSS worker has deleted the shuffleID before it tries to update it. Is this behavior normal?
No description provided.
The text was updated successfully, but these errors were encountered: