Documentation for Sharding rolling update (#29666)

2020-09-30 12:31:03 +02:00 · 2020-09-30 12:31:03 +02:00 · 90b79144e5
commit 90b79144e5
parent 2caa560aab
6 changed files with 36 additions and 32 deletions
--- a/akka-cluster/src/main/scala/akka/cluster/ClusterEvent.scala
+++ b/akka-cluster/src/main/scala/akka/cluster/ClusterEvent.scala
@ -202,19 +202,6 @@ object ClusterEvent {
    def getAllDataCenters: java.util.Set[String] =
      scala.collection.JavaConverters.setAsJavaSetConverter(allDataCenters).asJava

-    /**
-     * INTERNAL API
-     * @return `true` if more than one `Version` among the members, which
-     *        indicates that a rolling update is in progress
-     */
-    @InternalApi private[akka] def hasMoreThanOneAppVersion: Boolean = {
-      if (members.isEmpty) false
-      else {
-        val v = members.head.appVersion
-        members.exists(_.appVersion != v)
-      }
-    }
-
    /**
     * Replace the set of unreachable datacenters with the given set
     */
--- a/akka-cluster/src/multi-jvm/scala/akka/cluster/JoinSeedNodeSpec.scala
+++ b/akka-cluster/src/multi-jvm/scala/akka/cluster/JoinSeedNodeSpec.scala
@ -72,7 +72,6 @@ abstract class JoinSeedNodeSpec extends MultiNodeSpec(JoinSeedNodeMultiJvmSpec)
      List(address(ordinary1), address(ordinary2)).foreach { a =>
        cluster.state.members.find(_.address == a).get.appVersion should ===(Version("2.0"))
      }
-      cluster.state.hasMoreThanOneAppVersion should ===(true)

      enterBarrier("after-2")
    }
--- a/akka-cluster/src/test/scala/akka/cluster/ClusterSpec.scala
+++ b/akka-cluster/src/test/scala/akka/cluster/ClusterSpec.scala
@ -105,7 +105,6 @@ class ClusterSpec extends AkkaSpec(ClusterSpec.config) with ImplicitSender {
      awaitAssert(clusterView.status should ===(MemberStatus.Up))
      clusterView.self.appVersion should ===(Version("1.2.3"))
      clusterView.members.find(_.address == selfAddress).get.appVersion should ===(Version("1.2.3"))
-      clusterView.state.hasMoreThanOneAppVersion should ===(false)
    }

    "reply with InitJoinAck for InitJoin after joining" in {
--- a/akka-docs/src/main/paradox/additional/rolling-updates.md
+++ b/akka-docs/src/main/paradox/additional/rolling-updates.md
@ -39,10 +39,34 @@ Additionally you can find advice on @ref:[Persistence - Schema Evolution](../per

 ## Cluster Sharding

-During a rolling update, sharded entities receiving traffic may be moved during @ref:[shard rebalancing](../typed/cluster-sharding-concepts.md#shard-rebalancing), 
-to an old or new node in the cluster, based on the pluggable allocation strategy and settings.
-When an old node is stopped the shards that were running on it are moved to one of the
-other old nodes remaining in the cluster. The `ShardCoordinator` is itself a cluster singleton. 
+During a rolling update, sharded entities receiving traffic may be moved, based on the pluggable allocation
+strategy and settings. When an old node is stopped the shards that were running on it are moved to one of the
+other remaining nodes in the cluster when messages are sent to those shards.
+
+To make rolling updates as smooth as possible there is a configuration property that defines the version of the
+application. This is used by rolling update features to distinguish between old and new nodes. For example,
+the default `LeasShardAllocationStrategy` avoids allocating shards to old nodes during a rolling update.
+The `LeasShardAllocationStrategy` sees that there is rolling update in progress when there are members with
+different configured `app-version`.
+
+To make use of this feature you need to define the `app-version` and increase it for each rolling update.
+
+```
+akka.cluster.app-version = 1.2.3
+```
+
+To understand which is old and new it compares the version numbers using normal conventions,
+see @apidoc[akka.util.Version] for more details.
+
+Rebalance is also disabled during rolling updates, since shards from stopped nodes are anyway supposed to be
+started on new nodes. Messages to shards that were stopped on the old nodes will allocate corresponding shards
+on the new nodes, without waiting for rebalance actions. 
+
+You should also enable the @ref:[health check for Cluster Sharding](../typed/cluster-sharding.md#health-check) if
+you use Akka Management. The readiness check will delay incoming traffic to the node until Sharding has been
+initialized and can accept messages.
+
+The `ShardCoordinator` is itself a cluster singleton.
 To minimize downtime of the shard coordinator, see the strategies about @ref[ClusterSingleton](#cluster-singleton) rolling updates below.

 A few specific changes to sharding configuration require @ref:[a full cluster restart](#cluster-sharding-configuration-change).
@ -54,6 +78,9 @@ it is recommended to upgrade the oldest node last. This way cluster singletons a
 Otherwise, in the worst case cluster singletons may be migrated from node to node which requires coordination and initialization 
 overhead several times.

+[Kubernetes Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) with `RollingUpdate`
+strategy will roll out updates in this preferred order, from newest to oldest. 
+
 ## Cluster Shutdown
 
 ### Graceful shutdown 
--- a/akka-docs/src/main/paradox/typed/cluster-sharding-concepts.md
+++ b/akka-docs/src/main/paradox/typed/cluster-sharding-concepts.md
@ -82,20 +82,10 @@ persistent (durable), e.g. with @ref:[Persistence](persistence.md) (or see @ref:
 location.

 The logic that decides which shards to rebalance is defined in a pluggable shard
-allocation strategy. The default implementation `ShardCoordinator.LeastShardAllocationStrategy`
-picks shards for handoff from the `ShardRegion` with most number of previously allocated shards.
-They will then be allocated to the `ShardRegion` with least number of previously allocated shards,
-i.e. new members in the cluster.
+allocation strategy. The default implementation `LeastShardAllocationStrategy` allocates new shards
+to the `ShardRegion` (node) with least number of previously allocated shards. 

-For the `LeastShardAllocationStrategy` there is a configurable threshold (`rebalance-threshold`) of
-how large the difference must be to begin the rebalancing. The difference between number of shards in
-the region with most shards and the region with least shards must be greater than the `rebalance-threshold`
-for the rebalance to occur.
-
-A `rebalance-threshold` of 1 gives the best distribution and therefore typically the best choice.
-A higher threshold means that more shards can be rebalanced at the same time instead of one-by-one.
-That has the advantage that the rebalance process can be quicker but has the drawback that the
-the number of shards (and therefore load) between different nodes may be significantly different.
+See also @ref:[Shard allocation](cluster-sharding.md#shard-allocation).

 ### ShardCoordinator state

--- a/akka-docs/src/main/paradox/typed/cluster-sharding.md
+++ b/akka-docs/src/main/paradox/typed/cluster-sharding.md
@ -486,6 +486,8 @@ Monitoring of each shard region is off by default. Add them by defining the enti
 akka.cluster.sharding.healthcheck.names = ["counter-1", "HelloWorld"]
 ```

+See also additional information about how to make @ref:[smooth rolling updates](../additional/rolling-updates.md#cluster-sharding).
+
 ## Inspecting cluster sharding state

 Two requests to inspect the cluster state are available: