Documentation for Sharding rolling update (#29666)

This commit is contained in:
Patrik Nordwall 2020-09-30 12:31:03 +02:00 committed by GitHub
parent 2caa560aab
commit 90b79144e5
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
6 changed files with 36 additions and 32 deletions

View file

@ -202,19 +202,6 @@ object ClusterEvent {
def getAllDataCenters: java.util.Set[String] =
scala.collection.JavaConverters.setAsJavaSetConverter(allDataCenters).asJava
/**
* INTERNAL API
* @return `true` if more than one `Version` among the members, which
* indicates that a rolling update is in progress
*/
@InternalApi private[akka] def hasMoreThanOneAppVersion: Boolean = {
if (members.isEmpty) false
else {
val v = members.head.appVersion
members.exists(_.appVersion != v)
}
}
/**
* Replace the set of unreachable datacenters with the given set
*/

View file

@ -72,7 +72,6 @@ abstract class JoinSeedNodeSpec extends MultiNodeSpec(JoinSeedNodeMultiJvmSpec)
List(address(ordinary1), address(ordinary2)).foreach { a =>
cluster.state.members.find(_.address == a).get.appVersion should ===(Version("2.0"))
}
cluster.state.hasMoreThanOneAppVersion should ===(true)
enterBarrier("after-2")
}

View file

@ -105,7 +105,6 @@ class ClusterSpec extends AkkaSpec(ClusterSpec.config) with ImplicitSender {
awaitAssert(clusterView.status should ===(MemberStatus.Up))
clusterView.self.appVersion should ===(Version("1.2.3"))
clusterView.members.find(_.address == selfAddress).get.appVersion should ===(Version("1.2.3"))
clusterView.state.hasMoreThanOneAppVersion should ===(false)
}
"reply with InitJoinAck for InitJoin after joining" in {

View file

@ -39,10 +39,34 @@ Additionally you can find advice on @ref:[Persistence - Schema Evolution](../per
## Cluster Sharding
During a rolling update, sharded entities receiving traffic may be moved during @ref:[shard rebalancing](../typed/cluster-sharding-concepts.md#shard-rebalancing),
to an old or new node in the cluster, based on the pluggable allocation strategy and settings.
When an old node is stopped the shards that were running on it are moved to one of the
other old nodes remaining in the cluster. The `ShardCoordinator` is itself a cluster singleton.
During a rolling update, sharded entities receiving traffic may be moved, based on the pluggable allocation
strategy and settings. When an old node is stopped the shards that were running on it are moved to one of the
other remaining nodes in the cluster when messages are sent to those shards.
To make rolling updates as smooth as possible there is a configuration property that defines the version of the
application. This is used by rolling update features to distinguish between old and new nodes. For example,
the default `LeasShardAllocationStrategy` avoids allocating shards to old nodes during a rolling update.
The `LeasShardAllocationStrategy` sees that there is rolling update in progress when there are members with
different configured `app-version`.
To make use of this feature you need to define the `app-version` and increase it for each rolling update.
```
akka.cluster.app-version = 1.2.3
```
To understand which is old and new it compares the version numbers using normal conventions,
see @apidoc[akka.util.Version] for more details.
Rebalance is also disabled during rolling updates, since shards from stopped nodes are anyway supposed to be
started on new nodes. Messages to shards that were stopped on the old nodes will allocate corresponding shards
on the new nodes, without waiting for rebalance actions.
You should also enable the @ref:[health check for Cluster Sharding](../typed/cluster-sharding.md#health-check) if
you use Akka Management. The readiness check will delay incoming traffic to the node until Sharding has been
initialized and can accept messages.
The `ShardCoordinator` is itself a cluster singleton.
To minimize downtime of the shard coordinator, see the strategies about @ref[ClusterSingleton](#cluster-singleton) rolling updates below.
A few specific changes to sharding configuration require @ref:[a full cluster restart](#cluster-sharding-configuration-change).
@ -54,6 +78,9 @@ it is recommended to upgrade the oldest node last. This way cluster singletons a
Otherwise, in the worst case cluster singletons may be migrated from node to node which requires coordination and initialization
overhead several times.
[Kubernetes Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) with `RollingUpdate`
strategy will roll out updates in this preferred order, from newest to oldest.
## Cluster Shutdown
### Graceful shutdown

View file

@ -82,20 +82,10 @@ persistent (durable), e.g. with @ref:[Persistence](persistence.md) (or see @ref:
location.
The logic that decides which shards to rebalance is defined in a pluggable shard
allocation strategy. The default implementation `ShardCoordinator.LeastShardAllocationStrategy`
picks shards for handoff from the `ShardRegion` with most number of previously allocated shards.
They will then be allocated to the `ShardRegion` with least number of previously allocated shards,
i.e. new members in the cluster.
allocation strategy. The default implementation `LeastShardAllocationStrategy` allocates new shards
to the `ShardRegion` (node) with least number of previously allocated shards.
For the `LeastShardAllocationStrategy` there is a configurable threshold (`rebalance-threshold`) of
how large the difference must be to begin the rebalancing. The difference between number of shards in
the region with most shards and the region with least shards must be greater than the `rebalance-threshold`
for the rebalance to occur.
A `rebalance-threshold` of 1 gives the best distribution and therefore typically the best choice.
A higher threshold means that more shards can be rebalanced at the same time instead of one-by-one.
That has the advantage that the rebalance process can be quicker but has the drawback that the
the number of shards (and therefore load) between different nodes may be significantly different.
See also @ref:[Shard allocation](cluster-sharding.md#shard-allocation).
### ShardCoordinator state

View file

@ -486,6 +486,8 @@ Monitoring of each shard region is off by default. Add them by defining the enti
akka.cluster.sharding.healthcheck.names = ["counter-1", "HelloWorld"]
```
See also additional information about how to make @ref:[smooth rolling updates](../additional/rolling-updates.md#cluster-sharding).
## Inspecting cluster sharding state
Two requests to inspect the cluster state are available: