Previous change logs can be found at CHANGELOG-3.1.
v3.2.19 (2018-04-24)
See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.
- Fix
etcd_debugging_server_lease_expired_total
Prometheus metric. - Fix race conditions in v2 server stat collecting.
- Add
etcd_server_is_leader
Prometheus metric.
- Fix TLS reload when certificate SAN field only includes IP addresses but no domain names.
- In Go, server calls
(*tls.Config).GetCertificate
for TLS reload if and only if server's(*tls.Config).Certificates
field is not empty, or(*tls.ClientHelloInfo).ServerName
is not empty with a valid SNI from the client. Previously, etcd always populates(*tls.Config).Certificates
on the initial client TLS handshake, as non-empty. Thus, client was always expected to supply a matching SNI in order to pass the TLS verification and to trigger(*tls.Config).GetCertificate
to reload TLS assets. - However, a certificate whose SAN field does not include any domain names but only IP addresses would request
*tls.ClientHelloInfo
with an emptyServerName
field, thus failing to trigger the TLS reload on initial TLS handshake; this becomes a problem when expired certificates need to be replaced online. - Now,
(*tls.Config).Certificates
is created empty on initial TLS client handshake, first to trigger(*tls.Config).GetCertificate
, and then to populate rest of the certificates on every new TLS connection, even when client SNI is empty (e.g. cert only includes IPs).
- In Go, server calls
- Add
--initial-election-tick-advance
flag to configure initial election tick fast-forward.- By default,
--initial-election-tick-advance=true
, then local member fast-forwards election ticks to speed up "initial" leader election trigger. - This benefits the case of larger election ticks. For instance, cross datacenter deployment may require longer election timeout of 10-second. If true, local node does not need wait up to 10-second. Instead, forwards its election ticks to 8-second, and have only 2-second left before leader election.
- Major assumptions are that: cluster has no active leader thus advancing ticks enables faster leader election. Or cluster already has an established leader, and rejoining follower is likely to receive heartbeats from the leader after tick advance and before election timeout.
- However, when network from leader to rejoining follower is congested, and the follower does not receive leader heartbeat within left election ticks, disruptive election has to happen thus affecting cluster availabilities.
- Now, this can be disabled by setting
--initial-election-tick-advance=false
. - Disabling this would slow down initial bootstrap process for cross datacenter deployments. Make tradeoffs by configuring
--initial-election-tick-advance
at the cost of slow initial bootstrap. - If single-node, it advances ticks regardless.
- Address disruptive rejoining follower node.
- By default,
- Compile with Go 1.8.7.
v3.2.18 (2018-03-29)
See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.
- Adjust election timeout on server restart to reduce disruptive rejoining servers.
- Previously, etcd fast-forwards election ticks on server start, with only one tick left for leader election. This is to speed up start phase, without having to wait until all election ticks elapse. Advancing election ticks is useful for cross datacenter deployments with larger election timeouts. However, it was affecting cluster availability if the last tick elapses before leader contacts the restarted node.
- Now, when etcd restarts, it adjusts election ticks with more than one tick left, thus more time for leader to prevent disruptive restart.
- Add missing
etcd_network_peer_sent_failures_total
count.
- Compile with Go 1.8.7.
v3.2.17 (2018-03-08)
See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.
- Fix server panic on invalid Election Proclaim/Resign HTTP(S) requests.
- Previously, wrong-formatted HTTP requests to Election API could trigger panic in etcd server.
- e.g.
curl -L http://localhost:2379/v3/election/proclaim -X POST -d '{"value":""}'
,curl -L http://localhost:2379/v3/election/resign -X POST -d '{"value":""}'
.
- Prevent overflow by large
TTL
values forLease
Grant
.TTL
parameter toGrant
request is unit of second.- Leases with too large
TTL
values exceedingmath.MaxInt64
expire in unexpected ways. - Server now returns
rpctypes.ErrLeaseTTLTooLarge
to client, when the requestedTTL
is larger than 9,000,000,000 seconds (which is >285 years). - Again, etcd
Lease
is meant for short-periodic keepalives or sessions, in the range of seconds or minutes. Not for hours or days!
- Enable etcd server
raft.Config.CheckQuorum
when starting withForceNewCluster
.
- Compile with Go 1.8.7.
v3.2.16 (2018-02-12)
See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.
- Fix
mvcc
"unsynced" watcher restore operation.- "unsynced" watcher is watcher that needs to be in sync with events that have happened.
- That is, "unsynced" watcher is the slow watcher that was requested on old revision.
- "unsynced" watcher restore operation was not correctly populating its underlying watcher group.
- Which possibly causes missing events from "unsynced" watchers.
- Compile with Go 1.8.5.
v3.2.15 (2018-01-22)
See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.
- Prevent server panic from member update/add with wrong scheme URLs.
- Log user context cancel errors on stream APIs in debug level with TLS.
- Compile with Go 1.8.5.
v3.2.14 (2018-01-11)
See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.
- Compile with Go 1.8.5.
v3.2.13 (2018-01-02)
See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.
- Remove verbose error messages on stream cancel and gRPC info-level logs in server-side.
- Fix gRPC server panic on
GracefulStop
TLS-enabled server.
- Compile with Go 1.8.5.
v3.2.12 (2017-12-20)
See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.
- Upgrade
google.golang.org/grpc
fromv1.7.4
tov1.7.5
. - Upgrade
github.com/grpc-ecosystem/grpc-gateway
fromv1.3
tov1.3.0
.
- Fix error message of
Revision
compactor in server-side.
- Add
MaxCallSendMsgSize
andMaxCallRecvMsgSize
fields toclientv3.Config
.- Fix exceeded response size limit error in client-side.
- Address kubernetes#51099.
- In previous versions(v3.2.10, v3.2.11), client response size was limited to only 4 MiB.
MaxCallSendMsgSize
default value is 2 MiB, if not configured.MaxCallRecvMsgSize
default value ismath.MaxInt32
, if not configured.
- Compile with Go 1.8.5.
v3.2.11 (2017-12-05)
See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.
- Upgrade
google.golang.org/grpc
fromv1.7.3
tov1.7.4
.
See security doc for more details.
- Fix racey grpc-go's server handler transport
WriteStatus
call to prevent TLS-enabled etcd server crash. - Add gRPC RPC failure warnings to help debug such issues in the future.
- Remove
--listen-metrics-urls
flag in monitoring document (non-released inv3.2.x
, planned forv3.3.x
).
- Compile with Go 1.8.5.
v3.2.10 (2017-11-16)
See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.
- Upgrade
google.golang.org/grpc
fromv1.2.1
tov1.7.3
. - Upgrade
github.com/grpc-ecosystem/grpc-gateway
fromv1.2.0
tov1.3
.
See security doc for more details.
- Revert discovery SRV auth
ServerName
with*.{ROOT_DOMAIN}
to support non-wildcard subject alternative names in the certs (see issue #8445 for more contexts).- For instance,
etcd --discovery-srv=etcd.local
will only authenticate peers/clients when the provided certs have root domainetcd.local
(not*.etcd.local
) as an entry in Subject Alternative Name (SAN) field.
- For instance,
- Replace backend key-value database
boltdb/bolt
withcoreos/bbolt
to address backend database size issue.
- Rewrite balancer to handle network partitions.
- Compile with Go 1.8.5.
v3.2.9 (2017-10-06)
See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.
See security doc for more details.
- Update
golang.org/x/crypto/bcrypt
(see golang/crypto@6c586e1). - Fix discovery SRV bootstrapping to authenticate
ServerName
with*.{ROOT_DOMAIN}
, in order to support sub-domain wildcard matching (see issue #8445 for more contexts).- For instance,
etcd --discovery-srv=etcd.local
will only authenticate peers/clients when the provided certs have root domain*.etcd.local
as an entry in Subject Alternative Name (SAN) field.
- For instance,
- Compile with Go 1.8.4.
v3.2.8 (2017-09-29)
See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.
- Fix v2 client failover to next endpoint on mutable operation.
- Handle
KeysOnly
flag.
- Compile with Go 1.8.3.
v3.2.7 (2017-09-01)
See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.
- Fix
concurrency/stm
Put with serializable snapshot.- Use store revision from first fetch to resolve write conflicts instead of modified revision.
- Compile with Go 1.8.3.
v3.2.6 (2017-08-21)
See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.
- Fix watch restore from snapshot.
- Fix multiple URLs for
--listen-peer-urls
flag. - Add
--enable-pprof
flag to etcd configuration file format.
- Fix
etcd_debugging_mvcc_keys_total
inconsistency.
- Compile with Go 1.8.3.
v3.2.5 (2017-08-04)
See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.
- Return non-zero exit code on unhealthy
endpoint health
.
See security doc for more details.
- Server supports reverse-lookup on wildcard DNS
SAN
. For instance, if peer cert contains only DNS names (no IP addresses) in Subject Alternative Name (SAN) field, server first reverse-lookups the remote IP address to get a list of names mapping to that address (e.g.nslookup IPADDR
). Then accepts the connection if those names have a matching name with peer cert's DNS names (either by exact or wildcard match). If none is matched, server forward-lookups each DNS entry in peer cert (e.g. look upexample.default.svc
when the entry is*.example.default.svc
), and accepts connection only when the host's resolved addresses have the matching IP address with the peer's remote IP address. For example, peer B's CSR (withcfssl
) SAN field is["*.example.default.svc", "*.example.default.svc.cluster.local"]
when peer B's remote IP address is10.138.0.2
. When peer B tries to join the cluster, peer A reverse-lookup the IP10.138.0.2
to get the list of host names. And either exact or wildcard match the host names with peer B's cert DNS names in Subject Alternative Name (SAN) field. If none of reverse/forward lookups worked, it returns an error"tls: "10.138.0.2" does not match any of DNSNames ["*.example.default.svc","*.example.default.svc.cluster.local"]
. See issue#8268 for more detail.
- Fix unreachable
/metrics
endpoint when--enable-v2=false
.
- Handle
PrevKv
flag.
- Add container registry
gcr.io/etcd-development/etcd
.
- Compile with Go 1.8.3.
v3.2.4 (2017-07-19)
See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.
- Do not block on active client stream when stopping server
- Fix gRPC proxy Snapshot RPC error handling
- Compile with Go 1.8.3.
v3.2.3 (2017-07-14)
See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.
- Let clients establish unlimited streams
- Tag docker images with minor versions
- e.g.
docker pull quay.io/coreos/etcd:v3.2
to fetch latest v3.2 versions
- e.g.
- Compile with Go 1.8.3.
v3.2.2 (2017-07-07)
See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.
- Rate-limit lease revoke on expiration.
- Extend leases on promote to avoid queueing effect on lease expiration.
See security doc for more details.
- Server accepts connections if IP matches, without checking DNS entries. For instance, if peer cert contains IP addresses and DNS names in Subject Alternative Name (SAN) field, and the remote IP address matches one of those IP addresses, server just accepts connection without further checking the DNS names. For example, peer B's CSR (with
cfssl
) SAN field is["invalid.domain", "10.138.0.2"]
when peer B's remote IP address is10.138.0.2
andinvalid.domain
is a invalid host. When peer B tries to join the cluster, peer A successfully authenticates B, since Subject Alternative Name (SAN) field has a valid matching IP address. See issue#8206 for more detail.
- Accept connection with matched IP SAN but no DNS match.
- Don't check DNS entries in certs if there's a matching IP.
- Use user-provided listen address to connect to gRPC gateway.
net.Listener
rewrites IPv4 0.0.0.0 to IPv6 [::], breaking IPv6 disabled hosts.- Only v3.2.0, v3.2.1 are affected.
- Compile with Go 1.8.3.
v3.2.1 (2017-06-23)
See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.
- Fix backend database in-memory index corruption issue on restore (only 3.2.0 is affected).
- Fix Txn marshaling.
- Fix backend database size debugging metrics.
- Compile with Go 1.8.3.
v3.2.0 (2017-06-09)
See code changes and v3.2 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.2 upgrade guide.
- Improve backend read concurrency.
- Increased
--snapshot-count
default value from 10,000 to 100,000.- Higher snapshot count means it holds Raft entries in memory for longer before discarding old entries.
- It is a trade-off between less frequent snapshotting and higher memory usage.
- User lower
--snapshot-count
value for lower memory usage. - User higher
--snapshot-count
value for better availabilities of slow followers (less frequent snapshots from leader).
clientv3.Lease.TimeToLive
returnsLeaseTimeToLiveResponse.TTL == -1
on lease not found.clientv3.NewFromConfigFile
is moved toclientv3/yaml.NewConfig
.embed.Etcd.Peers
field is now[]*peerListener
.- Rejects domains names for
--listen-peer-urls
and--listen-client-urls
(3.1 only prints out warnings), since domain name is invalid for network interface binding.
- Upgrade
google.golang.org/grpc
fromv1.0.4
tov1.2.1
. - Upgrade
github.com/grpc-ecosystem/grpc-gateway
tov1.2.0
.
- Add
etcd_debugging_server_lease_expired_total
metrics.
See security doc for more details.
- TLS certificates get reloaded on every client connection. This is useful when replacing expiry certs without stopping etcd servers; it can be done by overwriting old certs with new ones. Refreshing certs for every connection should not have too much overhead, but can be improved in the future, with caching layer. Example tests can be found here.
- Server denies incoming peer certs with wrong IP
SAN
. For instance, if peer cert contains any IP addresses in Subject Alternative Name (SAN) field, server authenticates a peer only when the remote IP address matches one of those IP addresses. This is to prevent unauthorized endpoints from joining the cluster. For example, peer B's CSR (withcfssl
) SAN field is["*.example.default.svc", "*.example.default.svc.cluster.local", "10.138.0.27"]
when peer B's actual IP address is10.138.0.2
, not10.138.0.27
. When peer B tries to join the cluster, peer A will reject B with the errorx509: certificate is valid for 10.138.0.27, not 10.138.0.2
, because B's remote IP address does not match the one in Subject Alternative Name (SAN) field. - Server resolves TLS
DNSNames
when checkingSAN
. For instance, if peer cert contains only DNS names (no IP addresses) in Subject Alternative Name (SAN) field, server authenticates a peer only when forward-lookups (dig b.com
) on those DNS names have matching IP with the remote IP address. For example, peer B's CSR (withcfssl
) SAN field is["b.com"]
when peer B's remote IP address is10.138.0.2
. When peer B tries to join the cluster, peer A looks up the incoming hostb.com
to get the list of IP addresses (e.g.dig b.com
). And rejects B if the list does not contain the IP10.138.0.2
, with the errortls: 10.138.0.2 does not match any of DNSNames ["b.com"]
. - Auth support JWT token.
- RPCs
- Add Election, Lock service.
- Native client etcdserver/api/v3client
- client "embedded" in the server.
- Logging, monitoring
- Server warns large snapshot operations.
- Add
--enable-v2
flag to enable v2 API server.--enable-v2=true
by default.
- Add
--auth-token
flag. - v3.2 compactor runs every hour.
- Compactor only supports periodic compaction.
- Compactor continues to record latest revisions every 5-minute.
- For every hour, it uses the last revision that was fetched before compaction period, from the revision records that were collected every 5-minute.
- That is, for every hour, compactor discards historical data created before compaction period.
- The retention window of compaction period moves to next hour.
- For instance, when hourly writes are 100 and
--auto-compaction-retention=10
, v3.1 compacts revision 1000, 2000, and 3000 for every 10-hour, while v3.2 compacts revision 1000, 1100, and 1200 for every 1-hour. - If compaction succeeds or requested revision has already been compacted, it resets period timer and removes used compacted revision from historical revision records (e.g. start next revision collect and compaction from previously collected revisions).
- If compaction fails, it retries in 5 minutes.
- STM prefetching.
- Add namespace feature.
- Add
ErrOldCluster
with server version checking. - Translate
WithPrefix()
intoWithFromKey()
for empty key.
- Add
check perf
command. - Add
--from-key
flag to role grant-permission command. lock
command takes an optional command to execute.
- Allow snapshot over 512MB.
- Proxy endpoint discovery.
- Namespaces.
- Coalesce lease requests.
- Support DNS SRV priority for smart proxy routing.
- v3 client
- concurrency package's elections updated to match RPC interfaces.
- let client dial endpoints not in the balancer.
- Release
- Annotate acbuild with supports-systemd-notify.
- Add
nsswitch.conf
to Docker container image. - Add ppc64le, arm64(experimental) builds.
- Compile with Go 1.8.3.