-
Notifications
You must be signed in to change notification settings - Fork 23
Home
This is a forked repository of Facebook's Scribe, a logging system. This branch differs from the Facebook's in a number of important ways.
-
ZooKeeper-based network store discovery.
Sites that require high-availability logging may wish to discover remote scribes at runtime instead of hard-coding a remote scribe. Sites without hardware load balancers can provide significantly better logging system uptime, and operational flexibility, by discovering a downstream scribe at connection time.
Discovery is achieved with ephemeral ZooKeeper registrations. Scribes who wish to be discovered (typically an aggregator tier) register themselves at a given znode and discovered by (typically by frontend) scribes at connection time.
Global discovery options:
zk_server
- ZooKeeper cluster connection string. Any scribe wishing to discover, or be discovered, must set this option.zk_registration_prefix
- Path to registration znode, such as /scribe/aggregator. To reduce ZooKeeper load, only scribes that need to be discovered should register themselves.Upstream scribes discover aggregators by specifying a znode as the network store remote host. Note remote_port is specified, however, that value is overridden by the discovered port.
<store> category=travis type=network remote_host=zk://zk.foo.com/scribe/aggregator remote_port=1463 </store>
-
Load Balancing
Over time scribe aggregators processes accumulate connections, leading to large connection count imbalances. This "rich get richer" scenario leads to painful failure modes when a long-running process ends (through planned maintenance or other means) as a large number of connections must be reestablished at other aggregators. Should a large number of connections land on an already connection-rich aggregator its likely to exceed its memory limit and cause a cascading failure.
Two network store options have been added to balance connections through periodic disconnects. A feedback-based system was implemented (via counters published to the registration znode) but did not perform as well as this periodic disconnect method.
Network store configuration options:
default_max_msg_before_reconnect
- Number of messages to send before reconnecting. If connection pools are used, the lowest value will be used.allowable_delta_before_reconnect
- Avoid "thundering herds" by staggering disconnects by this amount.