Skip to content

wardn/nats-streaming-server

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NATS Streaming Server

NATS Streaming is an extremely performant, lightweight reliable streaming platform built on NATS.

License ReportCard Build Coverage

NATS Streaming provides the following high-level feature set.

  • Log based.
  • At-Least-Once Delivery model, giving reliable message delivery.
  • Rate matched on a per subscription basis.
  • Replay/Restart
  • Last Value Semantics

Table of Contents

Important Changes

Version 0.10.0

The server needs to persist more state for a client connection. Therefore, the Store interface has been changed:

  • Changed AddClient(clientID, hbInbox string) to AddClient(info *spb.ClientInfo)

For SQL Stores, the Clients table has been altered to add a proto column.
You can update the SQL table manually or run the provided scripts that create the tables if they don't exists and alter the Clients table adding the new column. For instance, with MySQL, you would run something similar to:

mysql -u root nss_db < scripts/mysql.db.sql

The above assumes you are in the NATS Streaming Server directory, and the streaming database is called nss_db.

Otherwise, from the mysql CLI, you can run the command:

mysql> alter table Clients add proto blob;
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

For Postgres, it would be:

nss_db=# alter table Clients add proto bytea;
ALTER TABLE

If you run the server version with 0.10.0 a database that has not been updated, you would get the following error:

[FTL] STREAM: Failed to start: unable to prepare statement "INSERT INTO Clients (id, hbinbox, proto) VALUES (?, ?, ?)": Error 1054: Unknown column 'proto' in 'field list'

Version 0.9.0

Additions to the Store interface to support deletion of channels.

  • Added Store.GetChannelLimits() API to return the store limits for a given channel.
  • Added Store.DeleteChannel() API to delete a channel.

Protocol was added to support replication of deletion of a channel in the cluster.

Version 0.8.0-beta

The Store interface has been slightly changed to accommodate the clustering feature.

  • Changed MstStore.Store() API to accept a *pb.MsgProto instead of a byte array. This is because the server is now assigning the sequence number. The store implementation should ignore the call if the given sequence number is below or equal to what has been already stored.
  • Added MsgStore.Empty() API to empty a given channel message store.

Version 0.6.0

The Store interface has been heavily modified. Some of the responsibilities have been moved into the server resulting on deletion of some Store APIs and removal of UserData fields in Client and ChannelStore (renamed Channel) objects.

NOTE: Although the interface has changed, the file format of the FileStore implementation has not, which means that there is backward/forward compatibility between this and previous releases.

The Store interface was updated:

  • Added error ErrAlreadyExists that CreateChannel() should return if channel already exists.
  • RecoveredState has now Channels (instead of Subs) and is a map of *RecoveredChannel keyed by channel name.
  • RecoveredChannel has a pointer to a Channel (formely ChannelStore) and an array of pointers to RecoveredSubscription objects.
  • RecoveredSubscription replaces RecoveredSubState.
  • Client no longer stores a UserData field.
  • Channel (formely ChannelStore) no longer stores a UserData field.
  • CreateChannel() no longer accepts a userData interface{} parameter. It returns a *Channel and an error. If the channel already exists, the error ErrAlreadyExists is returned.
  • LookupChannel(), HasChannel(), GetChannels(), GetChannelsCount(), GetClient(), GetClients, GetClientsCount() and MsgsState() APIs have all been removed. The server keeps track of clients and channels and therefore does not need those APIs.
  • AddClient() is now simply returning a *Client and error. It no longer accepts a userData interface{} parameter.
  • DeleteClient() now returns an error instead of returning the deleted *Client. This will allow the server to report possible errors.

The SubStore interface was updated:

  • DeleteSub() has been modified to return an error. This allows the server to report possible errors during deletion of a subscription.

The MsgStore interface was updated:

  • Lookup(), FirstSequence(), LastSequence(), FirstAndLastSequence(), GetSequenceFromTimestamp(), FirstMsg() and LastMsg() have all been modified to return an error. This is so that implementations that may fail to lookup, get the first sequence, etc... have a way to report the error to the caller.

Version 0.5.0

The Store interface was updated. There are 2 news APIs:

  • GetChannels(): Returns a map of *ChannelStore, keyed by channel names.
    The implementation needs to return a copy to make it safe for the caller to manipulate the map without a risk of concurrent access.
  • GetChannelsCount(): Returns the number of channels currently stored.

Version 0.4.0

The Store interface was updated. There are 2 news APIs:

  • Recover(): The recovery of persistent state was previously done in the constructor of the store implementation.
    It is now separate and specified with this API. The server will first instantiate the store, in which some initialization or checks can be made.
    If no error is reported, the server will then proceed with calling Recover(), which will returned the recovered state.
  • GetExclusiveLock(): In Fault Tolerance mode, when a server is elected leader, it will attempt to get an exclusive lock to the shared storage before proceeding.

Check the Store interface for more information.

Concepts

Relation to NATS

NATS Streaming Server by default embeds a NATS server. That is, the Streaming server is not a server per-se, but instead, a client to a NATS Server.
It means that Streaming clients are not directly connected to the streaming server, but instead communicate with the streaming server through NATS Server.

This detail is important when it comes to Streaming clients connections to the Streaming server. Indeed, since there is no direct connection, the server knows if a client is connected based on heartbeats.

It is therefore strongly recommended for clients to close their connection when the application exit, otherwise the server will consider these clients connected (sending data, etc...) until it detects missing heartbeats.

The streaming server creates internal subscriptions on specific subjects to communicate with its clients and/or other servers.

Note that NATS clients and NATS Streaming clients cannot exchange data between each other. That is, if a streaming client publishes on foo, a NATS client subscribing on that same subject will not receive the messages. Streaming messages are NATS messages made of a protobuf. The streaming server is expected to send ACKs back to producers and receive ACKs from consumers. If messages were freely exchanged with the NATS clients, this would cause problems.

Client Connections

As described, clients are not directly connected to the streaming server. Instead, they send connection requests. The request includes a client ID which is used by the server to uniquely identify, and restrict, a given client. That is, no two connections with the same client ID will be able to run concurrently.

This client ID links a given connection to its published messages, subscriptions, especially durable subscriptions. Indeed, durable subscriptions are stored as a combination of the client ID and durable name. More on durable subscriptions later.

It is also used to resolve the issue of not having direct client connections to the server. For instance, say that a client crashes without closing the connection. It later restarts with the same client ID. The server will detect that this client ID is already in-use. It will try to contact that known client to its original private inbox. If the server does not receive a response - which would be the case if the client crashed - it will replace the old client with this new one.
Otherwise, the server would reject the connection request since the client ID is already in-use.

Channels

Channels are at the heart of the NATS Streaming Server. Channels are subjects clients send data to and consume from.

Note: NATS Streaming server does not support wildcard for channels, that is, one cannot subscribe on foo.*, or >, etc...

The number of channels can be limited (and is by default) through configuration. Messages produced to a channel are stored in a message log inside this channel.

Message Log

You can view a message log as a ring buffer. Messages are appended to the end of the log. If a limit is set globally for all channels, or specifically for this channel, when the limit is reached, older messages are removed to make room for the new ones.

But except for the administrative size/age limit set for a message log, messages are not removed due to consumers consuming them. In fact, messages are stored regardless of the presence of subscriptions on that channel.

Subscriptions

A client creates a subscription on a given channel. Remember, there is no support for wildcards, so a subscription is really tied to one and only one channel. The server will maintain the subscription state on behalf of the client until the later closes the subscription (or its connection).

If there are messages in the log for this channel, messages will be sent to the consumer when the subscription is created. The server will send up to the maximum number of inflight messages as given by the client when creating the subscription.

When receiving ACKs from the consumer, the server will then deliver more messages, if more are available.

A subscription can be created to start at any point in the message log, either by message sequence, or by time.

There are several type of subscriptions:

Regular

The state of these subscriptions is removed when they are unsubscribed or closed (which is equivalent for this type of subscription) or the client connection is closed (explicitly by the client, or closed by the server due to timeout). They do, however, survive a server failure (if running with a persistent store).

Durable

If an application wishes to resume message consumption from where it previously stopped, it needs to create a durable subscription. It does so by providing a durable name, which is combined with the client ID provided when the client created its connection. The server then maintain the state for this subscription even after the client connection is closed.

Note: The starting position given by the client when restarting a durable subscription is ignored.

When the application wants to stop receiving messages on a durable subscription, it should close - but not unsubscribe- this subscription. If a given client library does not have the option to close a subscription, the application should close the connection instead.

When the application wants to delete the subscription, it must unsubscribe it. Once unsubscribed, the state is removed and it is then possible to re-use the durable name, but it will be considered a brand new durable subscription, with the start position being the one given by the client when creating the durable subscription.

Queue Group

When consumers want to consume from the same channel but each receive a different message, as opposed to all receiving the same messages, they need to create a queue subscription. When a queue group name is specified, the server will send each messages from the log to a single consumer in the group. The distribution of these messages is not specified, therefore applications should not rely on an expected delivery scheme.

After the first queue member is created, any other member joining the group will receive messages based on where the server is in the message log for that particular group. That means that starting position given by joining members is ignored by the server.

When the last member of the group leaves (subscription unsubscribed/closed/or connection closed), the group is removed from the server. The next application creating a subscription with the same name will create a new group, starting at the start position given in the subscription request.

A queue subscription can also be durable. For that, the client needs to provide a queue and durable name. The behavior is, as you would expect, a combination of queue and durable subscription. Unlike a durable subscription, though, the client ID is not part of the queue group name. It makes sense, because since client ID must be unique, it would prevent more than one connection to participate in the queue group. The main difference between a queue subscription and a durable one, is that when the last member leaves the group, the state of the group will be maintained by the server. Later, when a member rejoins the group, the delivery will resume.

Note: For a durable queue subscription, the last member to * unsubscribe * (not simply close) causes the group to be removed from the server.

Redelivery

When the server sends a message to a consumer, it expects to receive an ACK from this consumer. The consumer is the one specifying how long the server should wait before resending all unacknowledged messages to the consumer.

When the server restarts and recovers unacknowledged messages for a subscription, it will first attempt to redelivery those messages before sending new messages. However, if during the initial redelivery some messages don't make it to the client, the server cannot know that and will enable delivery of new messages.

So it is possible for an application to receive redelivered messages mixed with new messages. This is typically what happens outside of the server restart scenario.

For queue subscriptions, if a member has unacknowledged messages, when this member AckWait (which is the duration given to the server before the server should attempt to redeliver unacknowledged messages) time elapses, the messages are redelivered to any other member in the group (including itself).

If a queue member leaves the group, its unacknowledged messages are redistributed to other queue members.

Store Interface

Every store implementation follows the Store interface.

On startup, the server creates a unique instance of the Store. The constructor of a store implementation can do some initialization and configuration check, but must not access, or attempt to recover, the storage at this point. This is important because when the server runs on Fault Tolerance mode, the storage must be shared across many servers but only one server can be using it.

After instantiating the store, the server will then call Recover() in order to recover the persisted state. For implementations that do not support persistence, such as the provided MemoryStore, this call will simply return nil (without error) to indicate that no state was recovered.

The Store is used to add/delete clients, create/lookup channels, etc...

Creating/looking up a channel will return a ChannelStore, which points to two other interfaces, the SubStore and MsgStore. These stores, for a given channel, handle subscriptions and messages respectively.

If you wish to contribute to a new store type, your implementation must include all these interfaces. For stores that allow recovery (such as file store as opposed to memory store), there are additional structures that have been defined and should be returned by Recover().

The memory and the provided file store implementations both use a generic store implementation to avoid code duplication. When writing your own store implementation, you can do the same for APIs that don't need to do more than what the generic implementation provides. You can check MemStore and FileStore implementations for more details.

Store Encryption

The server can be configured to encrypt a message's payload when storing them, providing encryption at rest. This can be done from the command line or from the configuration file. Check encrypt and encryption_key in the Configuring section.

It is recommended to provide the encryption key through the environment variable NATS_STREAMING_ENCRYPTION_KEY instead of encryption_key. If encryption is enabled and NATS_STREAMING_ENCRYPTION_KEY is found, this will take precedence over encryption_key value.

You can pass this from the command line this way:

$ env NATS_STREAMING_ENCRYPTION_KEY="mykey" nats-streaming-server -store file -dir datastore -encrypt

We currently support two ciphers for encryption: AES and CHACHA. The default selected cipher depends on the platform. For ARM, we use CHACHA, otherwise we default to AES. You can always override that decision by explicitly specifying the cipher like this:

$ env NATS_STREAMING_ENCRYPTION_KEY="mykey" nats-streaming-server -store file -dir datastore -encrypt -encryption_cipher "CHACHA"

or, to select AES:

$ env NATS_STREAMING_ENCRYPTION_KEY="mykey" nats-streaming-server -store file -dir datastore -encrypt -encryption_cipher "AES"

Note that only message payload is encrypted, all other data stored by NATS Streaming server is not.

When running in clustering mode (see below), the server uses RAFT, which uses its own log files. Those will be encrypted too.

Starting a server with encrypt against a datastore that was not encrypted may result in failures when it comes to decrypt a message, which may not happen immediately upon startup. Instead, it will happen when attempting to deliver messages to consumers. However, when possible, the server will detect if the data was not encrypted and return the data without attempting to decrypt it. The server will also detect which cipher was used to encrypt the data and use the proper cipher to decrypt, even if this is not the currently selected cipher.

If the data is encrypted with a key and the server is restarted with a different key, the server will fail to decrypt messages when attempting to load them from the store.

Performance considerations: As expected, encryption is likely to decrease performance, but by how much is hard to define. In some performance tests on a MacbookPro 2.8 GHz Intel Core i7 with SSD, we have observed as little as 1% decrease to more than 30%. In addition to CPU cycles required for encryption, the encrypted payload is bigger, which result in more data being stored or read.

Clustering

NATS Streaming Server supports clustering and data replication, implemented with the Raft consensus algorithm, for the purposes of high availability.

There are two ways to bootstrap a cluster: with an explicit cluster configuration or with "auto" configuration using a seed node. With the first, we provide the IDs of the nodes participating in the cluster. In this case, the participating nodes will elect a leader. With the second, we start one server as a seed node, which will elect itself as leader, and subsequent servers will automatically join the seed (note that this also works with the explicit cluster configuration once the leader has been established). With the second method, we need to be careful to avoid starting multiple servers as seed as this will result in a split-brain. Both of these configuration methods are shown in the sections below.

It is recommended to run an odd number of servers in a cluster with a minimum of three servers to avoid split-brain scenarios. Note that if less than a majority of servers are available, the cluster cannot make progress, e.g. if two nodes go down in a cluster of three, the cluster is unavailable until at least one node comes back.

Note about Channels Partitioning and Clustering. These two features are mutually exclusive. Trying to start a server with channels Partitioning and Clustering enabled will result in a startup error. Clustering requires all channels to be replicated in the cluster.

Supported Stores

In order to run NATS Streaming Server in clustered mode, you need to specify a persistent store. At this time you have the choice between FILE and SQL

The NATS Streaming stores server meta information, messages and subscriptions to the storage you configure using the --store option.

However, in clustered mode, we use RAFT for leader election. The raft layer uses its own stores which are currently necessarily file based. The location of the RAFT stores defaults to the current directory under a sub-directory named after the cluster ID, or you can configure it using --cluster_log_path.

This means that even if you select an SQL Store, there will still be a need for storing data on the file system.

Clustering Configuration

We can bootstrap a NATS Streaming cluster by providing the cluster topology using the -cluster_peers flag. This is simply the set of node IDs participating in the cluster. Note that once a leader is established, we can start subsequent servers without providing this configuration as they will automatically join the leader. If the server is recovering, it will use the recovered cluster configuration.

Here is an example of starting three servers in a cluster. For this example, we run a separate NATS server which the Streaming servers connect to.

nats-streaming-server -store file -dir store-a -clustered -cluster_node_id a -cluster_peers b,c -nats_server nats://localhost:4222

nats-streaming-server -store file -dir store-b -clustered -cluster_node_id b -cluster_peers a,c -nats_server nats://localhost:4222

nats-streaming-server -store file -dir store-c -clustered -cluster_node_id c -cluster_peers a,b -nats_server nats://localhost:4222

Note that once a leader is elected, subsequent servers can be started without providing the cluster configuration. They will automatically join the cluster. Similarly, the cluster node ID does not need to be provided as one will be automatically assigned. As long as the file store is used, this ID will be recovered on restart.

nats-streaming-server -store file -dir store-d -clustered -nats_server nats://localhost:4222

The equivalent clustering configurations can be specified in a configuration file under the cluster group. See the Configuring section for more information.

Here is an example of a cluster of 3 nodes using the following configuration files. The nodes are running on host1, host2 and host3 respectively.

NOTE If you have an existing NATS cluster and want to run NATS Streaming Cluster on top of that, see details at the end of this section.

On host1, this configuration indicates that the server will accept client connections on port 4222. It will accept route connections on port 6222. It creates 2 routes, to host2 and host3 cluster port.

It defines the NATS Streaming cluster name as mycluster, uses a store file that points to the store directory. The cluster section inside streaming makes the NATS Streaming server run in cluster mode. This configuration explicitly define each node id (a for host1) and list its peers.

# NATS specific configuration
port: 4222
cluster {
  listen: 0.0.0.0:6222
  routes: ["nats://host2:6222", "nats://host3:6222"]
}

# NATS Streaming specific configuration
streaming {
  id: mycluster
  store: file
  dir: store
  cluster {
    node_id: "a"
    peers: ["b", "c"]
  }
}

Below is the configuration for the server running on host2. Notice how the routes are now to host1 and host3. The other thing that changed is the node id that is set to b and peers are updated accordingly to a and c.

Note that the dir configuration is also store but these are local directories and do not (actually must not) be shared. Each node will have its own copy of the datastore. You could have each configuration have a different value for dir if desired.

# NATS specific configuration
port: 4222
cluster {
  listen: 0.0.0.0:6222
  routes: ["nats://host1:6222", "nats://host3:6222"]
}

# NATS Streaming specific configuration
streaming {
  id: mycluster
  store: file
  dir: store
  cluster {
    node_id: "b"
    peers: ["a", "c"]
  }
}

As you would expect, for host3, the routes are now to host1 and host2 and the node id is c while its peers are a and b.

# NATS specific configuration
port: 4222
cluster {
  listen: 0.0.0.0:6222
  routes: ["nats://host1:6222", "nats://host2:6222"]
}

# NATS Streaming specific configuration
streaming {
  id: mycluster
  store: file
  dir: store
  cluster {
    node_id: "c"
    peers: ["a", "b"]
  }
}

In the example above, the configuration assumes no existing NATS Cluster and therefore configure the NATS routes between each node. Should you want to use an existing NATS cluster, do not include the "NATS specific configuration" section, instead, add nats_server_url in the streaming section to point to the NATS server you want.

Clustering Auto Configuration

We can also bootstrap a NATS Streaming cluster by starting one server as the seed node using the -cluster_bootstrap flag. This node will elect itself leader, so it's important to avoid starting multiple servers as seed. Once a seed node is started, other servers will automatically join the cluster. If the server is recovering, it will use the recovered cluster configuration.

Here is an example of starting three servers in a cluster by starting one as the seed and letting the others automatically join:

nats-streaming-server -store file -dir store-a -clustered -cluster_bootstrap -nats_server nats://localhost:4222

nats-streaming-server -store file -dir store-b -clustered -nats_server nats://localhost:4222

nats-streaming-server -store file -dir store-c -clustered -nats_server nats://localhost:4222

For a given cluster ID, if more than one server is started with cluster_bootstrap set to true, each server with this parameter will report the misconfiguration and exit.

The very first server that bootstrapped the cluster can be restarted, however, the operator must remove the datastores of the other servers that were incorrectly started with the bootstrap parameter before attempting to restart them. If they are restarted -even without the -cluster_bootstrap parameter- but with existing state, they will once again start as a leader.

Clustering And Containers

When running the docker image of NATS Streaming Server, you will want to specify a mounted volume so that the data can be recovered. Your -dir parameter then points to a directory inside that mounted volume. However, after a restart you may get a failure with a message similar to this:

[FTL] STREAM: Failed to start: streaming state was recovered but cluster log path "mycluster/a" is empty

This is because the server recovered the streaming state (as pointed by -dir and located in the mounted volume), but did not recover the RAFT specific state that is by default stored in a directory named after your cluster id, relative to the current directory starting the executable. In the context of a container, this data will be lost after the container is stopped.

In order to avoid this issue, you need to specify the -cluster_log_path and ensure that it points to the mounted volume so that the RAFT state can be recovered along with the Streaming state.

Fault Tolerance

To minimize the single point of failure, NATS Streaming server can be run in Fault Tolerance mode. It works by having a group of servers with one acting as the active server (accessing the store) and handling all communication with clients, and all others acting as standby servers.

It is important to note that is not possible to run Nats Stream as Fault Tolerance mode and Clustering mode at the same time.

To start a server in Fault Tolerance (FT) mode, you specify an FT group name.

Here is an example on how starting 2 servers in FT mode running on the same host and embedding the NATS servers:

nats-streaming-server -store file -dir datastore -ft_group "ft" -cluster nats://localhost:6222 -routes nats://localhost:6223 -p 4222

nats-streaming-server -store file -dir datastore -ft_group "ft" -cluster nats://localhost:6223 -routes nats://localhost:6222 -p 4223

Active Server

There is a single Active server in the group. This server was the first to obtain the exclusive lock for storage. For the FileStore implementation, it means trying to get an advisory lock for a file located in the shared datastore. For the SQLStore implementation, a special table is used in which the owner of the lock updates a column. Other instances will steal the lock if the column is not updated for a certain amount of time.

If the elected server fails to grab this lock because it is already locked, it will go back to standby.

Only the active server accesses the store and service all clients.

Standby servers

There can be as many as you want standby servers on the same group. These servers do not access the store and do not receive any data from the streaming clients. They are just running waiting for the detection of the active server failure.

Shared State

Actual file replication to multiple disks is not handled by the Streaming server. This - if required - needs to be handled by the user. For the FileStore implementation that we currently provide, the data store needs to be mounted by all servers in the FT group (e.g. an NFS Mount (Gluster in Google Cloud or EFS in Amazon).

Failover

When the active server fails, all standby servers will try to activate. The process consists of trying to get an exclusive lock on the storage.

The first server that succeeds will become active and go through the process of recovering the store and service clients. It is as if a server in standalone mode was automatically restarted.

All other servers that failed to get the store lock will go back to standby mode and stay in this mode until they stop receiving heartbeats from the current active server.

It is possible that a standby trying to activate is not able to immediately acquire the store lock. When that happens, it goes back into standby mode, but if it fails to receive heartbeats from an active server, it will try again to acquire the store lock. The interval is random but as of now set to a bit more than a second.

Partitioning

Note, this feature is incompatible with Clustering mode. Trying to start a server with Partitioning and Clustering enabled will result in a startup error.

It is possible to limit the list of channels a server can handle. This can be used to:

  • Prevent creation of unwanted channels
  • Share the load between several servers running with the same cluster ID

In order to do so, you need to enable the partitioning parameter in the configuration file, and also specify the list of allowed channels in the channels section of the store_limits configuration.

Channels don't need to override any limit, but they need to be specified for the server to service only these channels.

Here is an example:

partitioning: true
store_limits: {
    channels: {
        "foo": {}
        "bar": {}
        # Use of wildcards in configuration is allowed. However, applications cannot
        # publish to, or subscribe to, wildcard channels.
        "baz.*": {}
    }
}

When partitioning is enabled, multiple servers with the same cluster ID can coexist on the same NATS network, each server handling its own set of channels. Note however that in this mode, state is not replicated as it is in Clustering mode. The only communication between servers is to report if a given channel is handled in more than one serve.r

Wildcards

NATS Streaming does not support sending or subscribing to wildcard channels (such as foo.*).

However, it is possible to use wildcards to define the partition that a server can handle. For instance, with the following configuration:

partitioning: true
store_limits: {
    channels: {
        "foo.*": {}
        "bar.>": {}
    }
}

The streaming server would accept subscriptions or published messages to channels such as:

  1. foo.bar
  2. bar.baz
  3. bar.baz.bat
  4. ...

But would ignore messages or subscriptions on:

  1. foo
  2. foo.bar.baz
  3. bar
  4. some.other.channel
  5. ...

A given channel must be defined in a single server

When a server starts, it sends its list of channels to all other servers on the same cluster in an attempt to detect duplicate channels. When a server receives this list and finds that it has a channel in common, it will return an error to the emitting server, which will then fail to start.

However, on startup, it is possible that the underlying NATS cluster is not fully formed. The server would not get any response from the rest of the cluster and therefore start successfully and service clients. Anytime a Streaming server detects that a NATS server was added to the NATS cluster, it will resend its list of channels. It means that currently running servers may suddenly fail with a message regarding duplicate channels. Having the same channel on different servers means that a subscription would be created on all servers handling the channel, but only one server will receive and process message acknowledgements. Other servers would then redeliver messages (since they would not get the acknowledgements), which would cause duplicates.

In order to avoid issues with channels existing on several servers, it is ultimately the responsibility of the administrator to ensure that channels are unique.

Fault Tolerance and Partitioning

You can easily combine the Fault Tolerance and Partitioning feature.

To illustrate, suppose that we want two partitions, one for foo.> and one for bar.>.

The configuration for the first server foo.conf would look like:

partitioning: true
store_limits: {
    channels: {
        foo.>: {}
    }
}

The second configuration bar.conf would be:

partitioning: true
store_limits: {
    channels: {
        bar.>: {}
    }
}

If you remember, Fault Tolerance is configured by specifying a name (ft_group_name). Suppose there is an NFS mount called /nss/datastore on both host1 and host2.

Starting an FT pair for the partition foo could look like this:

host1$ nats-streaming-server -store file -dir /nss/datastore/foodata -sc foo.conf -ft_group_name foo -cluster nats://host1:6222 -routes nats://host2:6222,nats://host2:6223

host2$ nats-streaming-server -store file -dir /nss/datastore/foodata -sc foo.conf -ft_group_name foo -cluster nats://host2:6222 -routes nats://host1:6222,nats://host1:6223

Notice that each server on each note points to each other (the -routes parameter). The reason why we also point to 6223 will be explained later. They both listen for routes connections on their host's 6222 port.

We now start the FT pair for bar. Since we are running from the same machines (we don't have to), we need to use a different port:

host1$ nats-streaming-server -store file -dir /nss/datastore/bardata -sc bar.conf -ft_group_name bar -p 4223 -cluster nats://host1:6223 -routes nats://host2:6222,nats://host2:6223

host2$ nats-streaming-server -store file -dir /nss/datastore/bardata -sc bar.conf -ft_group_name bar -p 4223 -cluster nats://host2:6223 -routes nats://host1:6222,nats://host1:6223

You will notice that the -routes parameter points to both 6222 and 6223, this is so that both partitions belong to the same cluster and be view as "one" by a Streaming application connecting to this cluster. Effectively, we have created a full mesh of 4 NATS servers that can all communicate with each other. Two of these servers are backups for servers running on the same FT group.

Applications behavior

When an application connects, it specifies a cluster ID. If several servers are running with that same cluster ID, the application will be able to publish/subscribe to any channel handled by the cluster (as long as those servers are all connected to the NATS network).

A published message will be received by only the server that has that channel defined. If no server is handling this channel, no specific error is returned, instead the publish call will timeout. Same goes for message acknowledgements. Only the server handling the subscription on this channel should receive those.

However, other client requests (such as connection and subscription requests) are received by all servers. For connections, all servers handle them and the client library will receive a response from all servers in the cluster, but use the first one that it received.
For subscriptions, a server receiving the request for a channel that it does not handle will simply ignore the request. Again, if no server handle this channel, the client's subscription request will simply time out.

Monitoring

To monitor the NATS streaming system, a lightweight HTTP server is used on a dedicated monitoring port. The monitoring server provides several endpoints, all returning a JSON object.

The NATS monitoring endpoints support JSONP, making it easy to create single page monitoring web applications. Simply pass callback query parameter to any endpoint.

For example:

// JQuery example
$.getJSON('http://localhost:8222/streaming/serverz?callback=?', function(data) {
  console.log(data);
});

Enabling

To enable the monitoring server, start the NATS Streaming Server with the monitoring flag -m (or -ms) and specify the monitoring port.

Monitoring options

-m, --http_port PORT             HTTP PORT for monitoring
-ms,--https_port PORT            Use HTTPS PORT for monitoring (requires TLS cert and key)

To enable monitoring via the configuration file, use http: "host:port" or https: "host:port" (there is no explicit configuration flag for the monitoring interface).

For example, after running this:

nats-streaming-server -m 8222

you should see that the NATS Streaming server starts with the HTTP monitoring port enabled:

(...)
[53359] 2017/12/18 17:44:31.594407 [INF] Starting http monitor on 0.0.0.0:8222
[53359] 2017/12/18 17:44:31.594462 [INF] Listening for client connections on 0.0.0.0:4222
(...)

You can then point your browser (or curl) to http://localhost:8222/streaming

Endpoints

The following sections describe each supported monitoring endpoint: serverz, storez, clientsz, and channelsz.

/serverz

The endpoint http://localhost:8222/streaming/serverz reports various general statistics.

{
  "cluster_id": "test-cluster",
  "server_id": "JZnM3Yj3DSXc8a4StffjI0",
  "version": "0.16.0",
  "go": "go1.11.13",
  "state": "STANDALONE",
  "now": "2019-08-14T09:23:31.15467-06:00",
  "start_time": "2019-08-14T09:23:19.94433-06:00",
  "uptime": "11s",
  "clients": 0,
  "subscriptions": 0,
  "channels": 0,
  "total_msgs": 0,
  "total_bytes": 0
}

In clustering mode, there is an additional field that indicates the RAFT role of the given node. Here is an example:

{
  "cluster_id": "test-cluster",
  "server_id": "wmlNqEpVE8lBgCYb6j05ri",
  "version": "0.16.0",
  "go": "go1.11.13",
  "state": "CLUSTERED",
  "role": "Follower",
  "now": "2019-08-14T09:24:02.780595-06:00",
  "start_time": "2019-08-14T09:23:59.452994-06:00",
  "uptime": "3s",
  "clients": 0,
  "subscriptions": 0,
  "channels": 0,
  "total_msgs": 0,
  "total_bytes": 0
}

The possible values are: Leader, Follower or Candidate.

/storez

The endpoint http://localhost:8222/streaming/storez reports information about the store.

{
  "cluster_id": "test-cluster",
  "server_id": "8AjZq57k4JY7cfKEvuZ8iF",
  "now": "2019-04-16T09:57:32.857406-06:00",
  "type": "MEMORY",
  "limits": {
    "max_channels": 100,
    "max_msgs": 1000000,
    "max_bytes": 1024000000,
    "max_age": 0,
    "max_subscriptions": 1000,
    "max_inactivity": 0
  },
  "total_msgs": 130691,
  "total_bytes": 19587140
}

/clientsz

The endpoint http://localhost:8222/streaming/clientsz reports more detailed information about the connected clients.

It uses a paging mechanism which defaults to 1024 clients.

You can control these via URL arguments (limit and offset). For example: http://localhost:8222/streaming/clientsz?limit=1&offset=1.

{
  "cluster_id": "test-cluster",
  "server_id": "J3Odi0wXYKWKFWz5D5uhH9",
  "now": "2017-06-07T14:47:44.495254605+02:00",
  "offset": 1,
  "limit": 1,
  "count": 1,
  "total": 11,
  "clients": [
    {
      "id": "benchmark-sub-0",
      "hb_inbox": "_INBOX.jAHSY3hcL5EGFQGYmfayQK"
    }
  ]
}

You can also report detailed subscription information on a per client basis using subs=1. For example: http://localhost:8222/streaming/clientsz?limit=1&offset=1&subs=1.

{
  "cluster_id": "test-cluster",
  "server_id": "J3Odi0wXYKWKFWz5D5uhH9",
  "now": "2017-06-07T14:48:06.157468748+02:00",
  "offset": 1,
  "limit": 1,
  "count": 1,
  "total": 11,
  "clients": [
    {
      "id": "benchmark-sub-0",
      "hb_inbox": "_INBOX.jAHSY3hcL5EGFQGYmfayQK",
      "subscriptions": {
        "foo": [
          {
            "client_id": "benchmark-sub-0",
            "inbox": "_INBOX.jAHSY3hcL5EGFQGYmfayvC",
            "ack_inbox": "_INBOX.J3Odi0wXYKWKFWz5D5uhem",
            "is_durable": false,
            "is_offline": false,
            "max_inflight": 1024,
            "ack_wait": 30,
            "last_sent": 505597,
            "pending_count": 0,
            "is_stalled": false
          }
        ]
      }
    }
  ]
}

You can select a specific client based on its client ID with client=<id>, and get also get detailed statistics with subs=1. For example: http://localhost:8222/streaming/clientsz?client=me&subs=1.

{
  "id": "me",
  "hb_inbox": "_INBOX.HG0uDuNtAPxJQ1lVjIC2sr",
  "subscriptions": {
    "foo": [
      {
        "client_id": "me",
        "inbox": "_INBOX.HG0uDuNtAPxJQ1lVjIC389",
        "ack_inbox": "_INBOX.Q9iH2gsDPN57ZEvqswiYSL",
        "is_durable": false,
        "is_offline": false,
        "max_inflight": 1024,
        "ack_wait": 30,
        "last_sent": 0,
        "pending_count": 0,
        "is_stalled": false
      }
    ]
  }
}

/channelsz

The endpoint http://localhost:8222/streaming/channelsz reports the list of channels.

{
  "cluster_id": "test-cluster",
  "server_id": "J3Odi0wXYKWKFWz5D5uhH9",
  "now": "2017-06-07T14:48:41.680592041+02:00",
  "offset": 0,
  "limit": 1024,
  "count": 2,
  "total": 2,
  "names": [
    "bar"
    "foo"
  ]
}

It uses a paging mechanism which defaults to 1024 channels.

You can control these via URL arguments (limit and offset). For example: http://localhost:8222/streaming/channelsz?limit=1&offset=1.

{
  "cluster_id": "test-cluster",
  "server_id": "J3Odi0wXYKWKFWz5D5uhH9",
  "now": "2017-06-07T14:48:41.680592041+02:00",
  "offset": 1,
  "limit": 1,
  "count": 1,
  "total": 2,
  "names": [
    "foo"
  ]
}

You can also get the list of subscriptions with subs=1. For example: http://localhost:8222/streaming/channelsz?limit=1&offset=0&subs=1.

{
  "cluster_id": "test-cluster",
  "server_id": "J3Odi0wXYKWKFWz5D5uhH9",
  "now": "2017-06-07T15:01:02.166116959+02:00",
  "offset": 0,
  "limit": 1,
  "count": 1,
  "total": 2,
  "channels": [
    {
      "name": "bar",
      "msgs": 0,
      "bytes": 0,
      "first_seq": 0,
      "last_seq": 0,
      "subscriptions": [
        {
          "client_id": "me",
          "inbox": "_INBOX.S7kTJjOcToXiJAzGWgINit",
          "ack_inbox": "_INBOX.Y04G5pZxlint3yPXrSTjTV",
          "is_durable": false,
          "is_offline": false,
          "max_inflight": 1024,
          "ack_wait": 30,
          "last_sent": 0,
          "pending_count": 0,
          "is_stalled": false
        }
      ]
    }
  ]
}

You can select a specific channel based on its name with channel=name. For example: http://localhost:8222/streaming/channelsz?channel=foo.

{
  "name": "foo",
  "msgs": 649234,
  "bytes": 97368590,
  "first_seq": 1,
  "last_seq": 649234
}

And again, you can get detailed subscriptions with subs=1. For example: http://localhost:8222/streaming/channelsz?channel=foo&subs=1.

{
  "name": "foo",
  "msgs": 704770,
  "bytes": 105698990,
  "first_seq": 1,
  "last_seq": 704770,
  "subscriptions": [
    {
      "client_id": "me",
      "inbox": "_INBOX.jAHSY3hcL5EGFQGYmfayvC",
      "ack_inbox": "_INBOX.J3Odi0wXYKWKFWz5D5uhem",
      "is_durable": false,
      "is_offline": false,
      "max_inflight": 1024,
      "ack_wait": 30,
      "last_sent": 704770,
      "pending_count": 0,
      "is_stalled": false
    },
    {
      "client_id": "me2",
      "inbox": "_INBOX.jAHSY3hcL5EGFQGYmfaywG",
      "ack_inbox": "_INBOX.J3Odi0wXYKWKFWz5D5uhjV",
      "is_durable": false,
      "is_offline": false,
      "max_inflight": 1024,
      "ack_wait": 30,
      "last_sent": 704770,
      "pending_count": 0,
      "is_stalled": false
    },
    (...)
  ]
}

For durables that are currently running, the is_offline field is set to false. Here is an example:

{
  "name": "foo",
  "msgs": 0,
  "bytes": 0,
  "first_seq": 0,
  "last_seq": 0,
  "subscriptions": [
    {
      "client_id": "me",
      "inbox": "_INBOX.P23kNGFnwC7KRg3jIMB3IL",
      "ack_inbox": "_STAN.ack.pLyMpEyg7dgGZBS7jGXC02.foo.pLyMpEyg7dgGZBS7jGXCaw",
      "durable_name": "dur",
      "is_durable": true,
      "is_offline": false,
      "max_inflight": 1024,
      "ack_wait": 30,
      "last_sent": 0,
      "pending_count": 0,
      "is_stalled": false
    }
  ]
}

When that same durable goes offline, is_offline is be set to true. Although the client is possibly no longer connected (and would not appear in the clientsz endpoint), the client_id field is still displayed here.

{
  "name": "foo",
  "msgs": 0,
  "bytes": 0,
  "first_seq": 0,
  "last_seq": 0,
  "subscriptions": [
    {
      "client_id": "me",
      "inbox": "_INBOX.P23kNGFnwC7KRg3jIMB3IL",
      "ack_inbox": "_STAN.ack.pLyMpEyg7dgGZBS7jGXC02.foo.pLyMpEyg7dgGZBS7jGXCaw",
      "durable_name": "dur",
      "is_durable": true,
      "is_offline": true,
      "max_inflight": 1024,
      "ack_wait": 30,
      "last_sent": 0,
      "pending_count": 0,
      "is_stalled": false
    }
  ]
}

Getting Started

The best way to get the NATS Streaming Server is to use one of the pre-built release binaries which are available for OSX, Linux (x86-64/ARM), Windows. Instructions for using these binaries are on the GitHub releases page.

Of course you can build the latest version of the server from the master branch. The master branch will always build and pass tests, but may not work correctly in your environment. You will first need Go installed on your machine (version 1.5+ is required) to build the NATS server.

See also the NATS Streaming Quickstart tutorial.

Building

Building the NATS Streaming Server from source requires at least version 1.6 of Go, but we encourage the use of the latest stable release. Information on installation, including pre-built binaries, is available at http://golang.org/doc/install. Stable branches of operating system packagers provided by your OS vendor may not be sufficient.

Run go version to see the version of Go which you have installed.

Run go build inside the directory to build.

Run go test ./... to run the unit regression tests.

A successful build produces no messages and creates an executable called nats-streaming-server in the current directory. You can invoke that binary, with no options and no configuration file, to start a server with acceptable standalone defaults (no authentication, memory store).

Run go help for more guidance, and visit http://golang.org/ for tutorials, presentations, references and more.

Running

The NATS Streaming Server embeds a NATS Server. Starting the server with no argument will give you a server with default settings and a memory based store.

> ./nats-streaming-server
[72851] 2019/08/14 09:24:50.717761 [INF] STREAM: Starting nats-streaming-server[test-cluster] version 0.16.0
[72851] 2019/08/14 09:24:50.717813 [INF] STREAM: ServerID: AKvHMILjPtSohL5ZlmW8dk
[72851] 2019/08/14 09:24:50.717815 [INF] STREAM: Go version: go1.11.13
[72851] 2019/08/14 09:24:50.717817 [INF] STREAM: Git commit: [not set]
[72851] 2019/08/14 09:24:50.718389 [INF] Starting nats-server version 2.0.2
[72851] 2019/08/14 09:24:50.718395 [INF] Git commit [not set]
[72851] 2019/08/14 09:24:50.718595 [INF] Listening for client connections on 0.0.0.0:4222
[72851] 2019/08/14 09:24:50.718600 [INF] Server id is NDRKDJUODU5NBKZUCXDYF4PEZU3KA3KAMTFCFZ4VSUREUWQZAQAEOH36
[72851] 2019/08/14 09:24:50.718602 [INF] Server is ready
[72851] 2019/08/14 09:24:50.746557 [INF] STREAM: Recovering the state...
[72851] 2019/08/14 09:24:50.746567 [INF] STREAM: No recovered state
[72851] 2019/08/14 09:24:50.998824 [INF] STREAM: Message store is MEMORY
[72851] 2019/08/14 09:24:50.998869 [INF] STREAM: ---------- Store Limits ----------
[72851] 2019/08/14 09:24:50.998874 [INF] STREAM: Channels:                  100 *
[72851] 2019/08/14 09:24:50.998879 [INF] STREAM: --------- Channels Limits --------
[72851] 2019/08/14 09:24:50.998882 [INF] STREAM:   Subscriptions:          1000 *
[72851] 2019/08/14 09:24:50.998886 [INF] STREAM:   Messages     :       1000000 *
[72851] 2019/08/14 09:24:50.998889 [INF] STREAM:   Bytes        :     976.56 MB *
[72851] 2019/08/14 09:24:50.998893 [INF] STREAM:   Age          :     unlimited *
[72851] 2019/08/14 09:24:50.998897 [INF] STREAM:   Inactivity   :     unlimited *
[72851] 2019/08/14 09:24:50.998900 [INF] STREAM: ----------------------------------

The server will be started and listening for client connections on port 4222 (the default) from all available interfaces. The logs will be displayed to stderr as shown above.

Note that you do not need to start the embedded NATS Server. It is started automatically when you run the NATS Streaming Server. See below for details on how you secure the embedded NATS Server.

Process Signaling

On Unix systems, the NATS Streaming Server responds to the following signals:

Signal Result
SIGKILL Kills the process immediately
SIGINT, SIGTERM Stops the server gracefully
SIGUSR1 Reopens the log file for log rotation

The nats-streaming-server binary can be used to send these signals to running NATS Streaming Servers using the -sl flag:

# Reopen log file for log rotation
nats-streaming-server -sl reopen

# Stop the server
nats-streaming-server -sl quit

If there are multiple nats-streaming-server processes running, specify a PID:

nats-streaming-server -sl quit=<pid>

See the Windows Service section for information on signaling the NATS Streaming Server on Windows.

Windows Service

The NATS Streaming Server supports running as a Windows service. There is currently no installer and instead users should use sc.exe to install the service:

Here is how to create and start a NATS Streaming Server named nats-streaming-server. Note that the server flags should be passed in when creating the service.

sc.exe create nats-streaming-server binPath="\"<streaming server path>\nats-streaming-server.exe\" [NATS Streaming flags]"
sc.exe start nats-streaming-server

You can create several instances, giving it a unique name. For instance, this is how you would create two services, named nss1 and nss2, each one with its own set of parameters.

sc.exe create nss1 binPath="\"c:\nats-io\nats-streaming\nats-streaming-server.exe\" --syslog --syslog_name=nss1 -p 4222"

sc.exe create nss2 binPath="\"c:\nats-io\nats-streaming\nats-streaming-server.exe\" --syslog --syslog_name=nss2 -p 4223"

By default, when no logfile is specified, the server will use the system log. The default event source name is NATS-Streaming-Server. However, you can specify the name you want, which is especially useful when installing more than one service as described above.

Once the service is running, it can be controlled using sc.exe or nats-streaming-server.exe -sl:

REM Stop the server
nats-streaming-server.exe -sl quit

The above commands will default to controlling the service named nats-streaming-server. If the service has another name, it can be specified like this:

nats-streaming-server.exe -sl quit=<service name>

Embedding NATS Streaming

Embedding a NATS Streaming Server in your own code is easy. Simply import:

  stand "github.com/nats-io/nats-streaming-server/server"

(Note: we chose stand here, but you don't have to use that name)

Then if you want to use default options, it is as simple as doing:

  s, err := stand.RunServer("mystreamingserver")

If you want a more advance configuration, then you need to pass options. For instance, let's start the server with a file store instead of memory.

First import the stores package so we have access to the store type.

  stores "github.com/nats-io/nats-streaming-server/stores"

Then get the default options and override some of them:

  opts := stand.GetDefaultOptions()
  opts.StoreType = stores.TypeFile
  opts.FilestoreDir = "datastore"
  s, err := stand.RunServerWithOpts(opts, nil)

However, since the NATS Streaming Server project vendors NATS Server (that it uses as the communication layer with its clients and other servers in the cluster), there are some limitations.

If you were to import github.com/nats-io/gnatsd/server, instantiate a NATS Options structure, configure it and pass it to the second argument of RunServerWithOpts, you would get a compiler error. For instance doing this does not work:

import (
  natsd "github.com/nats-io/gnatsd/server"
  stand "github.com/nats-io/nats-streaming-server/server"
  stores "github.com/nats-io/nats-streaming-server/stores"
)

(...)

  nopts := &natsd.Options{}
  nopts.Port = 4223

  s, err := stand.RunServerWithOpts(nil, nopts)

You would get:

./myapp.go:36:35: cannot use nopts (type *"myapp/vendor/github.com/nats-io/gnatsd/server".Options) as type *"myapp/vendor/github.com/nats-io/nats-streaming-server/vendor/github.com/nats-io/gnatsd/server".Options in argument to "myapp/vendor/github.com/nats-io/nats-streaming-server/server".RunServerWithOpts

To workaround this issue, the NATS Streaming Server package provides a function NewNATSOptions() that is suitable for this approach:

  nopts := stand.NewNATSOptions()
  nopts.Port = 4223

  s, err := stand.RunServerWithOpts(nil, nopts)

That will work.

But, if you want to do advanced NATS configuration that requires types or interfaces that belong to the NATS Server package, then this approach won't work. In this case you need to run the NATS Server indepently and have the NATS Streaming Server connects to it. Here is how:

  // This configure the NATS Server using natsd package
  nopts := &natsd.Options{}
  nopts.HTTPPort = 8222
  nopts.Port = 4223

  // Setting a customer client authentication requires the NATS Server Authentication interface.
  nopts.CustomClientAuthentication = &myCustomClientAuth{}

  // Create the NATS Server
  ns := natsd.New(nopts)

  // Start it as a go routine
  go ns.Start()

  // Wait for it to be able to accept connections
  if !ns.ReadyForConnections(10 * time.Second) {
    panic("not able to start")
  }

  // Get NATS Streaming Server default options
  opts := stand.GetDefaultOptions()

  // Point to the NATS Server with host/port used above
  opts.NATSServerURL = "nats://localhost:4223"

  // Now we want to setup the monitoring port for NATS Streaming.
  // We still need NATS Options to do so, so create NATS Options
  // using the NewNATSOptions() from the streaming server package.
  snopts := stand.NewNATSOptions()
  snopts.HTTPPort = 8223

  // Now run the server with the streaming and streaming/nats options.
  s, err := stand.RunServerWithOpts(opts, snopts)
  if err != nil {
    panic(err)
  }

The above seem involved, but it really only if you use very advanced NATS Server options.

Configuring

Command line arguments

The NATS Streaming Server accepts command line arguments to control its behavior. There is a set of parameters specific to the NATS Streaming Server and some to the embedded NATS Server.

Note about parameters types

Type Remark
<bool> For booleans, either simply specify the parameter with value to enable (e.g -SD), or specify =false to disable
<size> You can specify as a number 1024 or as a size 1KB
<duration> Values must be expressed in the form _h_m_s, such as 1h or 20s or 1h30m, or 1.5h, etc...
Usage: nats-streaming-server [options]

Streaming Server Options:
    -cid, --cluster_id  <string>         Cluster ID (default: test-cluster)
    -st,  --store <string>               Store type: MEMORY|FILE|SQL (default: MEMORY)
          --dir <string>                 For FILE store type, this is the root directory
    -mc,  --max_channels <int>           Max number of channels (0 for unlimited)
    -msu, --max_subs <int>               Max number of subscriptions per channel (0 for unlimited)
    -mm,  --max_msgs <int>               Max number of messages per channel (0 for unlimited)
    -mb,  --max_bytes <size>             Max messages total size per channel (0 for unlimited)
    -ma,  --max_age <duration>           Max duration a message can be stored ("0s" for unlimited)
    -mi,  --max_inactivity <duration>    Max inactivity (no new message, no subscription) after which a channel can be garbage collected (0 for unlimited)
    -ns,  --nats_server <string>         Connect to this external NATS Server URL (embedded otherwise)
    -sc,  --stan_config <string>         Streaming server configuration file
    -hbi, --hb_interval <duration>       Interval at which server sends heartbeat to a client
    -hbt, --hb_timeout <duration>        How long server waits for a heartbeat response
    -hbf, --hb_fail_count <int>          Number of failed heartbeats before server closes the client connection
          --ft_group <string>            Name of the FT Group. A group can be 2 or more servers with a single active server and all sharing the same datastore
    -sl,  --signal <signal>[=<pid>]      Send signal to nats-streaming-server process (stop, quit, reopen)
          --encrypt <bool>               Specify if server should use encryption at rest
          --encryption_cipher <string>   Cipher to use for encryption. Currently support AES and CHAHA (ChaChaPoly). Defaults to AES
          --encryption_key <sting>       Encryption Key. It is recommended to specify it through the NATS_STREAMING_ENCRYPTION_KEY environment variable instead

Streaming Server Clustering Options:
    --clustered <bool>                   Run the server in a clustered configuration (default: false)
    --cluster_node_id <string>           ID of the node within the cluster if there is no stored ID (default: random UUID)
    --cluster_bootstrap <bool>           Bootstrap the cluster if there is no existing state by electing self as leader (default: false)
    --cluster_peers <string, ...>        Comma separated list of cluster peer node IDs to bootstrap cluster state
    --cluster_log_path <string>          Directory to store log replication data
    --cluster_log_cache_size <int>       Number of log entries to cache in memory to reduce disk IO (default: 512)
    --cluster_log_snapshots <int>        Number of log snapshots to retain (default: 2)
    --cluster_trailing_logs <int>        Number of log entries to leave after a snapshot and compaction
    --cluster_sync <bool>                Do a file sync after every write to the replication log and message store
    --cluster_raft_logging <bool>        Enable logging from the Raft library (disabled by default)

Streaming Server File Store Options:
    --file_compact_enabled <bool>        Enable file compaction
    --file_compact_frag <int>            File fragmentation threshold for compaction
    --file_compact_interval <int>        Minimum interval (in seconds) between file compactions
    --file_compact_min_size <size>       Minimum file size for compaction
    --file_buffer_size <size>            File buffer size (in bytes)
    --file_crc <bool>                    Enable file CRC-32 checksum
    --file_crc_poly <int>                Polynomial used to make the table used for CRC-32 checksum
    --file_sync <bool>                   Enable File.Sync on Flush
    --file_slice_max_msgs <int>          Maximum number of messages per file slice (subject to channel limits)
    --file_slice_max_bytes <size>        Maximum file slice size - including index file (subject to channel limits)
    --file_slice_max_age <duration>      Maximum file slice duration starting when the first message is stored (subject to channel limits)
    --file_slice_archive_script <string> Path to script to use if you want to archive a file slice being removed
    --file_fds_limit <int>               Store will try to use no more file descriptors than this given limit
    --file_parallel_recovery <int>       On startup, number of channels that can be recovered in parallel
    --file_truncate_bad_eof <bool>       Truncate files for which there is an unexpected EOF on recovery, dataloss may occur
    --file_read_buffer_size <size>       Size of messages read ahead buffer (0 to disable)
    --file_auto_sync <duration>          Interval at which the store should be automatically flushed and sync'ed on disk (<= 0 to disable)

Streaming Server SQL Store Options:
    --sql_driver <string>            Name of the SQL Driver ("mysql" or "postgres")
    --sql_source <string>            Datasource used when opening an SQL connection to the database
    --sql_no_caching <bool>          Enable/Disable caching for improved performance
    --sql_max_open_conns <int>       Maximum number of opened connections to the database

Streaming Server TLS Options:
    -secure <bool>                   Use a TLS connection to the NATS server without
                                     verification; weaker than specifying certificates.
    -tls_client_key <string>         Client key for the streaming server
    -tls_client_cert <string>        Client certificate for the streaming server
    -tls_client_cacert <string>      Client certificate CA for the streaming server

Streaming Server Logging Options:
    -SD, --stan_debug=<bool>         Enable STAN debugging output
    -SV, --stan_trace=<bool>         Trace the raw STAN protocol
    -SDV                             Debug and trace STAN
         --syslog_name               On Windows, when running several servers as a service, use this name for the event source
    (See additional NATS logging options below)

Embedded NATS Server Options:
    -a, --addr <string>              Bind to host address (default: 0.0.0.0)
    -p, --port <int>                 Use port for clients (default: 4222)
    -P, --pid <string>               File to store PID
    -m, --http_port <int>            Use port for http monitoring
    -ms,--https_port <int>           Use port for https monitoring
    -c, --config <string>            Configuration file

Logging Options:
    -l, --log <string>               File to redirect log output
    -T, --logtime=<bool>             Timestamp log entries (default: true)
    -s, --syslog <string>            Enable syslog as log method
    -r, --remote_syslog <string>     Syslog server addr (udp://localhost:514)
    -D, --debug=<bool>               Enable debugging output
    -V, --trace=<bool>               Trace the raw protocol
    -DV                              Debug and trace

Authorization Options:
        --user <string>              User required for connections
        --pass <string>              Password required for connections
        --auth <string>              Authorization token required for connections

TLS Options:
        --tls=<bool>                 Enable TLS, do not verify clients (default: false)
        --tlscert <string>           Server certificate file
        --tlskey <string>            Private key for server certificate
        --tlsverify=<bool>           Enable TLS, verify client certificates
        --tlscacert <string>         Client certificate CA for verification

NATS Clustering Options:
        --routes <string, ...>       Routes to solicit and connect
        --cluster <string>           Cluster URL for solicited routes

Common Options:
    -h, --help                       Show this message
    -v, --version                    Show version
        --help_tls                   TLS help.

Configuration file

You can use a configuration file to configure the options specific to the NATS Streaming server.

Use the -sc or -stan_config command line parameter to specify the file to use.

For the embedded NATS Server, you can use another configuration file and pass it to the Streaming server using -c or --config command line parameters.

Since most options do not overlap, it is possible to combine all options into a single file and specify this file using either the -sc or -c command line parameter.

However, the option named tls is common to NATS Server and NATS Streaming Server. If you plan to use a single configuration file and configure TLS, you should have all the streaming configuration included in a streaming map. This is actually a good practice regardless if you use TLS or not, to protect against possible addition of new options in NATS Server that would conflict with the names of NATS Streaming options.

For instance, you could use a single configuration file with such content:

# Some NATS Server TLS Configuration
listen: localhost:5222
tls: {
    cert_file: "/path/to/server/cert_file"
    key_file: "/path/to/server/key_file"
    verify: true
    timeout: 2
}

# NATS Streaming Configuration
streaming: {
    cluster_id: my_cluster

    tls: {
        client_cert: "/path/to/client/cert_file"
        client_key: "/path/to/client/key_file"
    }
}

However, if you want to avoid any possible conflict, simply use two different configuration files!

Note the order in which options are applied during the start of a NATS Streaming server:

  1. Start with some reasonable default options.
  2. If a configuration file is specified, override those options with all options defined in the file. This include options that are defined but have no value specified. In this case, the zero value for the type of the option will be used.
  3. Any command line parameter override all of the previous set options.

In general the configuration parameters are the same as the command line arguments. Below is the list of NATS Streaming parameters:

Parameter Meaning Possible values Usage example
cluster_id Cluster name String, underscore possible cluster_id: "my_cluster_name"
discover_prefix Subject prefix for server discovery by clients NATS Subject discover_prefix: "_STAN.Discovery"
store Store type memory, file or sql store: "file"
dir When using a file store, this is the root directory File path dir: "/path/to/storage
sd Enable debug logging true or false sd: true
sv Enable trace logging true or false sv: true
nats_server_url If specified, connects to an external NATS Server, otherwise starts an embedded one NATS URL nats_server_url: "nats://localhost:4222"
secure If true, creates a TLS connection to the server but without the need to use TLS configuration (no NATS Server certificate verification) true or false secure: true
tls TLS Configuration Map: tls: { ... } See details below
store_limits Store Limits Map: store_limits: { ... } See details below
file_options File Store specific options Map: file_options: { ... } See details below
sql_options SQL Store specific options Map: sql_options: { ... } See details below
hb_interval Interval at which the server sends an heartbeat to a client Duration hb_interval: "10s"
hb_timeout How long the server waits for a heartbeat response from the client before considering it a failed heartbeat Duration hb_timeout: "10s"
hb_fail_count Count of failed heartbeats before server closes the client connection. The actual total wait is: (fail count + 1) * (hb interval + hb timeout) Number hb_fail_count: 2
ft_group In Fault Tolerance mode, you can start a group of streaming servers with only one server being active while others are running in standby mode. This is the name of this FT group String ft_group: "my_ft_group"
partitioning If set to true, a list of channels must be defined in store_limits/channels section. This section then serves two purposes, overriding limits for a given channel or adding it to the partition true or false partitioning: true
cluster Cluster Configuration Map: cluster: { ... } See details below
encrypt Specify if server should encrypt messages (only the payload) when storing them true or false encrypt: true
encryption_cipher Cipher to use for encryption. Currently support AES and CHAHA (ChaChaPoly). Defaults to AES AES or CHACHA encryption_cipher: "AES"
encryption_key Encryption key. It is recommended to specify the key through the NATS_STREAMING_ENCRYPTION_KEY environment variable instead String encryption_key: "mykey"

TLS Configuration:

Note that the Streaming server uses a connection to a NATS Server, and so the NATS Streaming TLS Configuration is in fact a client-side TLS configuration.

Parameter Meaning Possible values Usage example
client_cert Client key for the streaming server File path client_cert: "/path/to/client/cert_file"
client_key Client certificate for the streaming server File path client_key: "/path/to/client/key_file"
client_ca Client certificate CA for the streaming server File path client_ca: "/path/to/client/ca_file"

Store Limits Configuration:

Parameter Meaning Possible values Usage example
max_channels Maximum number of channels, 0 means unlimited Number >= 0 max_channels: 100
max_subs Maximum number of subscriptions per channel, 0 means unlimited Number >= 0 max_subs: 100
max_msgs Maximum number of messages per channel, 0 means unlimited Number >= 0 max_msgs: 10000
max_bytes Total size of messages per channel, 0 means unlimited Number >= 0 max_bytes: 1GB
max_age How long messages can stay in the log Duration max_age: "24h"
max_inactivity How long without any subscription and any new message before a channel can be automatically deleted Duration max_inactivity: "24h"
channels A map of channel names with specific limits Map: channels: { ... } See details below

The channels section is a map with the key being the channel name. For instance:

   channels: {
       "foo": {
           max_msgs: 100
       }
   }

For a given channel, the possible parameters are:

Parameter Meaning Possible values Usage example
max_subs Maximum number of subscriptions per channel, 0 means unlimited Number >= 0 max_subs: 100
max_msgs Maximum number of messages per channel, 0 means unlimited Number >= 0 max_msgs: 10000
max_bytes Total size of messages per channel, 0 means unlimited Bytes max_bytes: 1GB
max_age How long messages can stay in the log Duration max_age: "24h"
max_inactivity How long without any subscription and any new message before a channel can be automatically deleted Duration max_inactivity: "24h"

File Options Configuration:

Parameter Meaning Possible values Usage example
compact Enable/disable file compaction. Only some of the files (clients.dat and subs.dat) are subject to compaction true or false compact: true
compact_fragmentation Compaction threshold (in percentage) Number >= 0 compact_fragmentation: 50
compact_interval Minimum interval between attempts to compact files Expressed in seconds compact_interval: 300
compact_min_size Minimum size of a file before compaction can be attempted Bytes compact_min_size: 1GB
buffer_size Size of buffers that can be used to buffer write operations Bytes buffer_size: 2MB
crc Define if CRC of records should be computed on reads true or false crc: true
crc_poly You can select the CRC polynomial. Note that changing the value after records have been persisted would result in server failing to start complaining about data corruption Number >= 0 crc_poly: 3988292384
sync_on_flush Define if server should perform "file sync" operations during a flush true or false sync_on_flush: true
slice_max_msgs Define the file slice maximum number of messages. If set to 0 and a channel count limit is set, then the server will set a slice count limit automatically Number >= 0 slice_max_msgs: 10000
slice_max_bytes Define the file slice maximum size (including the size of index file). If set to 0 and a channel size limit is set, then the server will set a slice bytes limit automatically Bytes slice_max_bytes: 64MB
slice_max_age Define the period of time covered by a file slice, starting at when the first message is stored. If set to 0 and a channel age limit is set, then the server will set a slice age limit automatically Duration slice_max_age: "24h"
slice_archive_script Define the location and name of a script to be invoked when the server discards a file slice due to limits. The script is invoked with the name of the channel, the name of data and index files. It is the responsibility of the script to then remove the unused files File path slice_archive_script: "/home/nats-streaming/archive/script.sh"
file_descriptors_limit Channels translate to sub-directories under the file store's root directory. Each channel needs several files to maintain the state so the need for file descriptors increase with the number of channels. This option instructs the store to limit the concurrent use of file descriptors. Note that this is a soft limit and there may be cases when the store will use more than this number. A value of 0 means no limit. Setting a limit will probably have a performance impact Number >= 0 file_descriptors_limit: 100
parallel_recovery When the server starts, the recovery of channels (directories) is done sequentially. However, when using SSDs, it may be worth setting this value to something higher than 1 to perform channels recovery in parallel Number >= 1 parallel_recovery: 4
read_buffer_size Size of buffers used to read ahead from message stores. This can significantly speed up sending messages to consumers after messages have been published. Default is 2MB. Set to 0 to disable Bytes read_buffer_size: 2MB
auto_sync Interval at which the store should be automatically flushed and sync'ed on disk. Default is every minute. Set to <=0 to disable Duration auto_sync: "2m"

Cluster Configuration:

Parameter Meaning Possible values Usage example
node_id ID of the node within the cluster if there is no stored ID String (no whitespace) node_id: "node-a"
bootstrap Bootstrap the cluster if there is no existing state by electing self as leader true or false bootstrap: true
peers List of cluster peer node IDs to bootstrap cluster state List of node IDs peers: ["node-b", "node-c"]
log_path Directory to store log replication data File path log_path: "/path/to/storage"
log_cache_size Number of log entries to cache in memory to reduce disk IO Number >= 0 log_cache_size: 1024
log_snapshots Number of log snapshots to retain Number >= 0 log_snapshots: 1
trailing_logs Number of log entries to leave after a snapshot and compaction Number >= 0 trailing_logs: 256
sync Do a file sync after every write to the replication log and message store true or false sync: true
raft_logging Enable logging from the Raft library (disabled by default) true or false raft_logging: true
raft_heartbeat_timeout Specifies the time in follower state without a leader before attempting an election Duration raft_heartbeat_timeout: "2s"
raft_election_timeout Specifies the time in candidate state without a leader before attempting an election Duration raft_election_timeout: "2s"
raft_lease_timeout Specifies how long a leader waits without being able to contact a quorum of nodes before stepping down as leader Duration raft_lease_timeout: "1s"
raft_commit_timeout Specifies the time without an Apply() operation before sending an heartbeat to ensure timely commit. Due to random staggering, may be delayed as much as 2x this value Duration raft_commit_timeout: "100ms"

SQL Options Configuration:

Parameter Meaning Possible values Usage example
driver Name of the SQL driver to use mysql or postgres driver: "mysql"
source How to connect to the database. This is driver specific String source: "ivan:pwd@/nss_db"
no_caching Enable/Disable caching for messages and subscriptions operations. The default is false, which means that caching is enabled true or false no_caching: false
max_open_conns Maximum number of opened connections to the database. Value <= 0 means no limit. The default is 0 (unlimited) Number max_open_conns: 5

Store Limits

The store_limits section in the configuration file (or the command line parameters -mc, -mm, etc..) allow you to configure the global limits.

These limits somewhat offer some upper bound on the size of the storage. By multiplying the limits per channel with the maximum number of channels, you will get a total limit.

It is not the case, though, if you override limits of some channels. Indeed, it is possible to define specific limits per channel. Here is how:

...
store_limits: {
    # Override some global limits
    max_channels: 10
    max_msgs: 10000
    max_bytes: 10MB
    max_age: "1h"

    # Per channel configuration.
    # Can be channels, channels_limits, per_channel, per_channel_limits or ChannelsLimits
    channels: {
        "foo": {
            # Possible options are the same than in the store_limits section, except
            # for max_channels. Not all limits need to be specified.
            max_msgs: 300
            max_subs: 50
        }
        "bar": {
            max_msgs:50
            max_bytes:1KB
        }
        "baz": {
            # Set to 0 for ignored (or unlimited)
            max_msgs: 0
            # Override with a lower limit
            max_bytes: 1MB
            # Override with a higher limit
            max_age: "2h"
        }
        # When using partitioning, channels need to be listed.
        # They don't have to override any limit.
        "bozo": {}

        # Wildcards are possible in configuration. This says that any channel
        # that will start with "foo" but with at least 2 tokens, will be
        # able to store 400 messages. Other limits are inherited from global.
        "foo.>": {
            max_msgs:400
        }
        # This one says that if the channel name starts with "foo.bar" but has
        # at least one more token, the sever will hold it for 2 hours instead
        # of one. The max number of messages is inherited from "foo.>", so the
        # limit will be 400. All other limits are inherited from global.
        "foo.bar.>": {
            max_age: "2h"
        }
        # Delete channels with this prefix once they don't have any
        # subscription and no new message for more than 1 hour
        "temp.>": {
            max_inactivity: "1h"
        }
    }
}
...

Note that the number of defined channels cannot be greater than the stores' maximum number of channels. This is true only for channels without wildcards.

Channels limits can override global limits by being either higher, lower or even set to unlimited.

An unlimited value applies to the specified limit, not to the whole channel

That is, in the configuration above, baz has the maximum number of messages set to 0, which means ignored or unlimited. Yet, other limits such as max bytes, max age and max subscriptions (inherited in this case) still apply. What that means is that the store will not check the number of messages but still check the other limits.

For a truly unlimited channel all limits need to be set to 0.

Limits inheritance

When starting the server from the command line, global limits that are not specified (configuration file or command line parameters) are inherited from default limits selected by the server.

Per-channel limits that are not explicitly configured inherit from the corresponding global limit (which can itself be inherited from default limit).

If a per-channel limit is set to 0 in the configuration file (or negative value programmatically), then it becomes unlimited, regardless of the corresponding global limit.

On startup the server displays the store limits. Notice the * at the right of a limit to indicate that the limit was inherited from the default store limits.

For channels that have been configured, their name is displayed and only the limits being specifically set are displayed to minimize the output.

Wildcards

Wildcards are allowed for channels configuration. Limits for foo.> will apply to any channel that starts with foo (but has at least one more token). If foo.bar.> is specified, it will inherit from foo.> and from global limits.

Below is what would be displayed with the above store limits configuration. Notice how foo.bar.> is indented compared to foo.> to show the inheritance.

[INF] STREAM: Starting nats-streaming-server[test-cluster] version 0.16.0
[INF] STREAM: ServerID: bFHdJP9hesjHIR0RheCl7W
[INF] STREAM: Go version: go1.11.13
[INF] STREAM: Git commit: [not set]
[INF] Starting nats-server version 2.0.2
[INF] Git commit [not set]
[INF] Listening for client connections on 0.0.0.0:4222
[INF] Server id is NDMRMUBKS37GDEPOZP4YVMTRLS6ROZS4O2JQVFOGDRJTGIY44OUV7ZSD
[INF] Server is ready
[INF] STREAM: Recovering the state...
[INF] STREAM: No recovered state
[INF] STREAM: Message store is MEMORY
[INF] STREAM: ---------- Store Limits ----------
[INF] STREAM: Channels:                   10
[INF] STREAM: --------- Channels Limits --------
[INF] STREAM:   Subscriptions:          1000 *
[INF] STREAM:   Messages     :         10000
[INF] STREAM:   Bytes        :      10.00 MB
[INF] STREAM:   Age          :        1h0m0s
[INF] STREAM:   Inactivity   :     unlimited *
[INF] STREAM: -------- List of Channels ---------
[INF] STREAM: baz
[INF] STREAM:  |-> Messages             unlimited
[INF] STREAM:  |-> Bytes                  1.00 MB
[INF] STREAM:  |-> Age                     2h0m0s
[INF] STREAM: bozo
[INF] STREAM: temp.>
[INF] STREAM:  |-> Inactivity              1h0m0s
[INF] STREAM: foo.>
[INF] STREAM:  |-> Messages                   400
[INF] STREAM:  foo.bar.>
[INF] STREAM:   |-> Age                    2h0m0s
[INF] STREAM: bar
[INF] STREAM:  |-> Messages                    50
[INF] STREAM:  |-> Bytes                  1.00 KB
[INF] STREAM: -----------------------------------

Securing

Authorization

If you use basic authorization, that is a single user defined through the configuration file or command line parameters, nothing else is required. The embedded NATS Server is started with those credentials and the NATS Streaming Server uses them.

However, if you define multiple users, then you must specify the username and password (or token) corresponding to the NATS Streaming Server's user.

For instance, suppose that your configuration file server.cfg has the following content:

listen: 127.0.0.1:4233
http: 127.0.0.1:8233

authorization {
    users = [
      {user: alice, password: foo}
      {user: bob,   password: bar}
    ]
}

and you start the NATS Streaming Server this way:

nats-streaming-server -config server.cfg

then the server would fail to start. You must specify the user used by the streaming server. For instance:

nats-streaming-server -config server.cfg -user alice -pass foo

TLS

While there are several TLS related parameters to the streaming server, securing the NATS Streaming server's connection is straightforward when you bear in mind that the relationship between the NATS Streaming server and the embedded NATS server is a client server relationship. To state simply, the streaming server is a client of it's embedded NATS server.

That means two sets of TLS configuration parameters must be used: TLS server parameters for the embedded NATS server, and TLS client parameters for the streaming server itself.

The streaming server specifies it's TLS client certificates with the following three parameters:

    -tls_client_key              Client key for the streaming server

    -tls_client_cert             Client certificate for the streaming server

    -tls_client_cacert           Client certificate CA for the streaming server

These could be the same certificates used with your NATS streaming clients.

The embedded NATS server specifies TLS server certificates with these:

        --tlscert <file>             Server certificate file

        --tlskey <file>              Private key for server certificate

        --tlscacert <file>           Client certificate CA for verification

The server parameters are used the same way you would secure a typical NATS server. See here.

Proper usage of the NATS Streaming Server requires the use of both client and server parameters.

e.g.:

nats-streaming-server -tls_client_cert client-cert.pem -tls_client_key client-key.pem -tls_client_cacert ca.pem -tlscert server-cert.pem -tlskey server-key.pem -tlscacert ca.pem

Further TLS related functionality can be found in usage, and should specifying cipher suites be required, a configuration file for the embedded NATS server can be passed through the -config command line parameter.

Persistence

By default, the NATS Streaming Server stores its state in memory, which means that if the streaming server is stopped, all state is lost. Still, this level of persistence allows applications to stop and later resume the stream of messages, and protect against applications disconnect (network or applications crash).

File Store

For a higher level of message delivery, the server should be configured with a file store. NATS Streaming Server comes with a basic file store implementation. Various file store implementations may be added in the future.

To start the server with a file store, you need to provide two parameters:

nats-streaming-server -store file -dir datastore

The parameter -store indicates what type of store to use, in this case file. The other (-dir) indicates in which directory the state should be stored.

The first time the server is started, it will create two files in this directory, one containing some server related information (server.dat) another to record clients information (clients.dat).

When a streaming client connects, it uses a client identification, which the server registers in this file. When the client disconnects, the client is cleared from this file.

When the client publishes or subscribe to a new subject (also called channel), the server creates a sub-directory whose name is the subject. For instance, if the client subscribes to foo, and assuming that you started the server with -dir datastore, then you will find a directory called datastore/foo. In this directory you will find several files: one to record subscriptions information (subs.dat), and a series of files that logs the messages msgs.1.dat, etc...

The number of sub-directories, which again correspond to channels, can be limited by the configuration parameter -max_channels. When the limit is reached, any new subscription or message published on a new channel will produce an error.

On a given channel, the number of subscriptions can also be limited with the configuration parameter -max_subs. A client that tries to create a subscription on a given channel (subject) for which the limit is reached will receive an error.

Finally, the number of stored messages for a given channel can also be limited with the parameter -max_msgs and/or -max_bytes. However, for messages, the client does not get an error when the limit is reached. The oldest messages are discarded to make room for the new messages.

File Store Options

As described in the Configuring section, there are several options that you can use to configure a file store.

Regardless of channel limits, you can configure message logs to be split in individual files (called file slices). You can configure those slices by number of messages it can contain (--file_slice_max_msgs), the size of the file - including the corresponding index file (--file_slice_max_bytes), or the period of time that a file slice should cover - starting at the time the first message is stored in that slice (--file_slice_max_age). The default file store options are defined such that only the slice size is configured to 64MB.

Note: If you don't configure any slice limit but you do configure channel limits, then the server will automatically set some limits for file slices.

When messages accumulate in a channel, and limits are reached, older messages are removed. When the first file slice becomes empty, the server removes this file slice (and corresponding index file).

However, if you specify a script (--file_slice_archive_script), then the server will rename the slice files (data and index) with a .bak extension and invoke the script with the channel name, data and index file names.
The files are left in the channel's directory and therefore it is the script responsibility to delete those files when done. At any rate, those files will not be recovered on a server restart, but having lots of unused files in the directory may slow down the server restart.

For instance, suppose the server is about to delete file slice datastore/foo/msgs.1.dat (and datastore/foo/msgs.1.idx), and you have configured the script /home/nats-streaming/archive_script.sh. The server will invoke:

/home/nats-streaming/archive_script.sh foo datastore/foo/msgs.1.dat.bak datastore/foo/msgs.2.idx.bak

Notice how the files have been renamed with the .bak extension so that they are not going to be recovered if the script leave those files in place.

As previously described, each channel corresponds to a sub-directory that contains several files. It means that the need for file descriptors increase with the number of channels. In order to scale to ten or hundred thousands of channels, the option fds_limit (or command line parameter --file_fds_limit) may be considered to limit the total use of file descriptors.

Note that this is a soft limit. It is possible for the store to use more file descriptors than the given limit if the number of concurrent read/writes to different channels is more than the said limit. It is also understood that this may affect performance since files may need to be closed/re-opened as needed.

File Store Recovery Errors

We have added the ability for the server to truncate any file that may otherwise report an unexpected EOF error during the recovery process.

Since dataloss is likely to occur, the default behavior for the server on startup is to report recovery error and stop. It will now print the content of the first corrupted record before exiting.

With the -file_truncate_bad_eof parameter, the server will still print those bad records but truncate each file at the position of the first corrupted record in order to successfully start.

To prevent the use of this parameter as the default value, this option is not available in the configuration file. Moreover, the server will fail to start if started more than once with that parameter.
This flag may help recover from a store failure, but since data may be lost in that process, we think that the operator needs to be aware and make an informed decision.

Note that this flag will not help with file corruption due to bad CRC for instance. You have the option to disable CRC on recovery with the -file_crc=false option.

Let's review the impact and suggested steps for each of the server's corrupted files:

  • server.dat: This file contains meta data and NATS subjects used to communicate with client applications. If a corruption is reported with this file, we would suggest that you stop all your clients, stop the server, remove this file, restart the server. This will create a new server.dat file, but will not attempt to recover the rest of the channels because the server assumes that there is no state. So you should stop and restart the server once more. Then, you can restart all your clients.

  • clients.dat: This contains information about client connections. If the file is truncated to move past an unexpected EOF error, this can result in no issue at all, or in client connections not being recovered, which means that the server will not know about possible running clients, and therefore it will not try to deliver any message to those non recovered clients, or reject incoming published messages from those clients. It is also possible that the server recovers a client connection that was actually closed. In this case, the server may attempt to deliver or redeliver messages unnecessarily.

  • subs.dat: This is a channel's subscriptions file (under the channel's directory). If this file is truncated and some records are lost, it may result in no issue at all, or in client applications not receiving their messages since the server will not know about them. It is also possible that acknowledged messages get redelivered (since their ack may have been lost).

  • msgs.<n>.dat: This is a channel's message log (several per channel). If one of those files is truncated, then message loss occurs. With the unexpected EOF errors, it is likely that only the last "file slice" of a channel will be affected. Nevertheless, if a lower sequence file slice is truncated, then gaps in message sequence will occur. So it would be possible for a channel to have now messages 1..100, 110..300 for instance, with messages 101 to 109 missing. Again, this is unlikely since we expect the unexpected end-of-file errors to occur on the last slice.

For Clustered mode, this flag would work only for the NATS Streaming specific store files. As you know, NATS Streaming uses RAFT for consensus, and RAFT uses its own logs. You could try the option if the server reports unexpected EOF errors for NATS Streaming file stores, however, you may want to simply delete all NATS Streaming and RAFT stores for the failed node and restart it. By design, the other nodes in the cluster have replicated the data, so this node will become a follower and catchup with the rest of the cluster, getting the data from the current leader and recreating its local stores.

SQL Store

Using a SQL Database for persistence is another option.

In order to do so, -store simply needs to be set to sql and -sql_driver set to mysql or postgres (the two drivers supported at the moment). The parameter -sql_source is driver specific, but generally contains the information required to connect to a specific database on the given SQL database server.

Note that the NATS Streaming Server does not need root privileges to connect to the database since it does not create the database, tables or indexes. This has to be done by the Database Administrator.

We provide 2 files (scripts/mysql.db.sql and scripts/postgres.db.sql) that can be used to create the tables and indexes to the database of your choice. However, administrators are free to configure and optimize the database as long as the name of tables and columns are preserved, since the NATS Streaming Server is going to issue SQL statements based on those.

Here is an example of creating an user nss with password password for the MySQL database:

mysql -u root -e "CREATE USER 'nss'@'localhost' IDENTIFIED BY 'password'; GRANT ALL PRIVILEGES ON *.* TO 'nss'@'localhost'; CREATE DATABASE nss_db;"

The above has gives all permissions to user nss. Once this user is created, we can then create the tables using this user and selecting the nss_db database. We then execute all the SQL statements creating the tables from the sql file that is provided in this repo:

mysql -u nss -p -D nss_db -e "$(cat ./scripts/mysql.db.sql)"

SQL Store Example - Postgres

Run a local dockerized instance of postgres if you do not already have one:

ID=$(docker run -d -e POSTGRES_PASSWORD=password -p 5432:5432 postgres)

[Optional] Drop any previous tables to clear data from previous sessions:

cat scripts/drop_postgres.db.sql | docker exec -i $ID psql -h 127.0.1.1 -U postgres

Run the appropriate database migrations for Postgres:

cat scripts/postgres.db.sql | docker exec -i $ID psql -h 127.0.1.1 -U postgres

Run the nats streaming server with postgres at the sql_source:

DOCKER_BRIDGE_IP=$(docker inspect --format '{{(index .IPAM.Config 0).Gateway}}' bridge) docker run -d --name nats-streaming -p 4222:4222 -p 8222:32768 nats-streaming-local -SDV --store sql --sql_driver postgres --sql_source="user=postgres password=postgres host=$DOCKER_BRIDGE_IP port=5432 sslmode=disable"

SQL Read and Write Timeouts

Sometimes, it is possible that a DB connection between the streaming server and the DB server is stale but the connection is not dropped. This would cause the server to block while trying to store or lookup a message, or any other operation involving the database. Because of internal locking in the store implementation, this could cause the server to seemingly be unresponsive.

To mitigate that, you can pass readTimeout and writeTimeout options to the sql_source when starting the server. The MySQL driver had always had those options, but we have extended the Postgres driver that we use to provide those options in NATS Streaming v0.16.0. You pass the values as a duration, for instance 5s for 5 seconds.

Here is what a sql_source would look like for MySQL driver:

nats-streaming-server -store sql -sql_driver mysql -sql_source "nss:password@/nss_db?readTimeout=5s&writeTimeout=5s" ..

Or, for Postgres driver:

nats-streaming-server -store sql -sql_driver postgres -sql_source "dbname=nss_db readTimeout=5s writeTimeout=5s sslmode=disable" ..

Be careful to not make those values too small otherwise you could cause unwanted failures.

SQL Store Options

Aside from the driver and datasource, the available options are the maximum number of opened connections to the database (max_open_conns) that you may need to set to avoid errors due to too many opened files.

The other option is no_caching which is a boolean that enables/disables caching. By default caching is enabled. It means that some operations are buffered in memory before being sent to the database. For storing messages, this still offers the guarantee that if a producer gets an OK ack back, the message will be successfully persisted in the database.

For subscriptions, the optimization may lead to messages possibly redelivered if the server were to be restarted before some of the operations were "flushed" to the database. The performance improvement is significant to justify the risk of getting redelivered messages (which is always possible with NATS Streaming regardless of this option). Still, if you want to ensure that each operation is immediately committed to the database, you should set no_caching to true.

Clients

You can find here the list of NATS Streaming clients supported by Synadia. There are also links to community-contributed clients.

License

Unless otherwise noted, the NATS source files are distributed under the Apache Version 2.0 license found in the LICENSE file.

About

NATS Streaming System Server

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Go 99.6%
  • Other 0.4%