Add Lightbend's SBR to Akka Cluster, #29085 (#29099)

* change package name to akka.cluster.sbr * reference.conf has same config paths * akka.cluster.sbr.SplitBrainResolverProvider instead of com.lightbend.akka.sbr.SplitBrainResolverProvider * dependency from akka-cluster to akka-coordination, for lease strategy * move TestLease to akka-coordination and use that in SBR tests * remove keep-referee strategy * use keep-majority by default * review and adjust reference documentation Co-authored-by: Johan Andrén <johan@markatta.com> Co-authored-by: Johannes Rudolph <johannes.rudolph@gmail.com> Co-authored-by: Christopher Batey <christopher.batey@gmail.com> Co-authored-by: Arnout Engelen <github@bzzt.net>
2020-05-25 12:21:13 +02:00 · 2020-05-25 12:21:13 +02:00 · c45e6ef39b
commit c45e6ef39b
parent e0586e546c
37 changed files with 5612 additions and 67 deletions
--- a/akka-docs/src/main/paradox/additional/rolling-updates.md
+++ b/akka-docs/src/main/paradox/additional/rolling-updates.md
@ -67,7 +67,7 @@ Environments such as Kubernetes send a SIGTERM, however if the JVM is wrapped wi

 In case of network failures it may still be necessary to set the node's status to Down in order to complete the removal. 
@ref:[Cluster Downing](../typed/cluster.md#downing) details downing nodes and downing providers. 
-[Split Brain Resolver](https://doc.akka.io/docs/akka-enhancements/current/split-brain-resolver.html) can be used to ensure 
+@ref:[Split Brain Resolver](../split-brain-resolver.md) can be used to ensure 
 the cluster continues to function during network partitions and node failures. For example
 if there is an unreachability problem Split Brain Resolver would make a decision based on the configured downing strategy. 
  
--- a/akka-docs/src/main/paradox/common/other-modules.md
+++ b/akka-docs/src/main/paradox/common/other-modules.md
@ -26,6 +26,7 @@ An Akka Persistence journal and snapshot store backed by Couchbase.
 * [Akka Cluster Bootstrap](https://doc.akka.io/docs/akka-management/current/bootstrap/) helps bootstrapping an Akka cluster using Akka Discovery.
 * [Akka Management Cluster HTTP](https://doc.akka.io/docs/akka-management/current/cluster-http-management.html) provides HTTP endpoints for introspecting and managing Akka clusters.
 * [Akka Discovery for Kubernetes, Consul, Marathon, and AWS](https://doc.akka.io/docs/akka-management/current/discovery/)
+* [Kubernetes Lease](https://doc.akka.io/docs/akka-management/current/kubernetes-lease.html)

 ## [Akka gRPC](https://doc.akka.io/docs/akka-grpc/current/)

@ -33,8 +34,6 @@ Akka gRPC provides support for building streaming gRPC servers and clients on to

 ## Akka Resilience Enhancements

-* [Akka Split Brain Resolver](https://doc.akka.io/docs/akka-enhancements/current/split-brain-resolver.html)
-* [Kubernetes Lease](https://doc.akka.io/docs/akka-enhancements/current/kubernetes-lease.html)
 * [Akka Thread Starvation Detector](https://doc.akka.io/docs/akka-enhancements/current/starvation-detector.html)
 * [Akka Configuration Checker](https://doc.akka.io/docs/akka-enhancements/current/config-checker.html)
 * [Akka Diagnostics Recorder](https://doc.akka.io/docs/akka-enhancements/current/diagnostics-recorder.html)
@ -44,7 +43,6 @@ Akka gRPC provides support for building streaming gRPC servers and clients on to
 * [Akka Multi-DC Persistence](https://doc.akka.io/docs/akka-enhancements/current/persistence-dc/index.html)
 * [Akka GDPR for Persistence](https://doc.akka.io/docs/akka-enhancements/current/gdpr/index.html)

-
 ## Community Projects

 Akka has a vibrant and passionate user community, the members of which have created many independent projects using Akka as well as extensions to it. See [Community Projects](https://akka.io/community/).
--- a/akka-docs/src/main/paradox/coordination.md
+++ b/akka-docs/src/main/paradox/coordination.md
@ -35,10 +35,10 @@ Any lease implementation should provide the following guarantees:
 To acquire a lease:

 Scala
-:  @@snip [LeaseDocSpec.scala](/akka-coordination/src/test/scala/docs/akka/coordination/LeaseDocSpec.scala) { #lease-usage }
+:  @@snip [LeaseDocSpec.scala](/akka-docs/src/test/scala/docs/coordination/LeaseDocSpec.scala) { #lease-usage }

 Java
-:  @@snip [LeaseDocTest.java](/akka-coordination/src/test/java/jdocs/akka/coordination/lease/LeaseDocTest.java) { #lease-usage }
+:  @@snip [LeaseDocTest.java](/akka-docs/src/test/java/jdocs/coordination/LeaseDocTest.java) { #lease-usage }

 Acquiring a lease returns a @scala[Future]@java[CompletionStage] as lease implementations typically are implemented 
 via a third party system such as the Kubernetes API server or Zookeeper.
@ -53,10 +53,10 @@ It is important to pick a lease name that will be unique for your use case. If a
 in a Cluster the cluster host port can be use:

 Scala
-:  @@snip [LeaseDocSpec.scala](/akka-coordination/src/test/scala/docs/akka/coordination/LeaseDocSpec.scala) { #cluster-owner }
+:  @@snip [LeaseDocSpec.scala](/akka-docs/src/test/scala/docs/coordination/LeaseDocSpec.scala) { #cluster-owner }

 Java
-:  @@snip [LeaseDocTest.scala](/akka-coordination/src/test/java/jdocs/akka/coordination/lease/LeaseDocTest.java) { #cluster-owner }
+:  @@snip [LeaseDocTest.scala](/akka-docs/src/test/java/jdocs/coordination/LeaseDocTest.java) { #cluster-owner }

 For use cases where multiple different leases on the same node then something unique must be added to the name. For example
 a lease can be used with Cluster Sharding and in this case the shard Id is included in the lease name for each shard.
@ -77,7 +77,7 @@ Leases can be used for @ref[Cluster Singletons](cluster-singleton.md#lease) and

 ## Lease implementations

-* [Kubernetes API](https://doc.akka.io/docs/akka-enhancements/current/kubernetes-lease.html)
+* [Kubernetes API](https://doc.akka.io/docs/akka-management/current/kubernetes-lease.html)

 ## Implementing a lease

@ -85,10 +85,10 @@ Implementations should extend
 the @scala[`akka.coordination.lease.scaladsl.Lease`]@java[`akka.coordination.lease.javadsl.Lease`] 

 Scala
-:  @@snip [LeaseDocSpec.scala](/akka-coordination/src/test/scala/docs/akka/coordination/LeaseDocSpec.scala) { #lease-example }
+:  @@snip [LeaseDocSpec.scala](/akka-docs/src/test/scala/docs/coordination/LeaseDocSpec.scala) { #lease-example }

 Java
-:  @@snip [LeaseDocTest.scala](/akka-coordination/src/test/java/jdocs/akka/coordination/lease/LeaseDocTest.java) { #lease-example }
+:  @@snip [LeaseDocTest.java](/akka-docs/src/test/java/jdocs/coordination/LeaseDocTest.java) { #lease-example }

 The methods should provide the following guarantees:

@ -109,10 +109,10 @@ The lease implementation should have support for the following properties where
 This configuration location is passed into `getLease`.

 Scala
-:  @@snip [LeaseDocSpec.scala](/akka-coordination/src/test/scala/docs/akka/coordination/LeaseDocSpec.scala) { #lease-config }
+:  @@snip [LeaseDocSpec.scala](/akka-docs/src/test/scala/docs/coordination/LeaseDocSpec.scala) { #lease-config }

 Java
-:  @@snip [LeaseDocSpec.scala](/akka-coordination/src/test/scala/docs/akka/coordination/LeaseDocSpec.scala) { #lease-config }
+:  @@snip [LeaseDocSpec.scala](/akka-docs/src/test/scala/docs/coordination/LeaseDocSpec.scala) { #lease-config }



--- a/akka-docs/src/main/paradox/split-brain-resolver.md
+++ b/akka-docs/src/main/paradox/split-brain-resolver.md
@ -0,0 +1,481 @@
+# Split Brain Resolver
+
+When operating an Akka cluster you must consider how to handle
+[network partitions](http://en.wikipedia.org/wiki/Network_partition) (a.k.a. split brain scenarios)
+and machine crashes (including JVM and hardware failures). This is crucial for correct behavior if
+you use @ref:[Cluster Singleton](typed/cluster-singleton.md) or @ref:[Cluster Sharding](typed/cluster-sharding.md),
+especially together with Akka Persistence.
+
+## Module info
+
+To use Akka Split Brain Resolver is part of `akka-cluster` and you probably already have that
+dependency included. Otherwise, add the following dependency in your project:
+
+@@dependency[sbt,Maven,Gradle] {
+  group=com.typesafe.akka
+  artifact=akka-cluster_$scala.binary.version$
+  version=$akka.version$
+}
+
+@@project-info{ projectId="akka-cluster" }
+
+## Enable the Split Brain Resolver
+
+You need to enable the Split Brain Resolver by configuring it as downing provider in the configuration of
+the `ActorSystem` (`application.conf`):
+
+```
+akka.cluster.downing-provider-class = "akka.cluster.sbr.SplitBrainResolverProvider"
+```
+
+You should also consider the different available @ref:[downing strategies](#strategies).
+
+## The Problem
+
+A fundamental problem in distributed systems is that network partitions (split brain scenarios) and
+machine crashes are indistinguishable for the observer, i.e. a node can observe that there is a problem
+with another node, but it cannot tell if it has crashed and will never be available again or if there is
+a network issue that might or might not heal again after a while. Temporary and permanent failures are
+indistinguishable because decisions must be made in finite time, and there always exists a temporary
+failure that lasts longer than the time limit for the decision.
+
+A third type of problem is if a process is unresponsive, e.g. because of overload, CPU starvation or
+long garbage collection pauses. This is also indistinguishable from network partitions and crashes.
+The only signal we have for decision is "no reply in given time for heartbeats" and this means that
+phenomena causing delays or lost heartbeats are indistinguishable from each other and must be
+handled in the same way.
+
+When there is a crash, we would like to remove the affected node immediately from the cluster membership.
+When there is a network partition or unresponsive process we would like to wait for a while in the hope
+that it is a transient problem that will heal again, but at some point, we must give up and continue with
+the nodes on one side of the partition and shut down nodes on the other side. Also, certain features are
+not fully available during partitions so it might not matter that the partition is transient or not if
+it just takes too long. Those two goals are in conflict with each other and there is a trade-off
+between how quickly we can remove a crashed node and premature action on transient network partitions.
+
+This is a difficult problem to solve given that the nodes on the different sides of the network partition
+cannot communicate with each other. We must ensure that both sides can make this decision by themselves and
+that they take the same decision about which part will keep running and which part will shut itself down.
+
+Another type of problem that makes it difficult to see the "right" picture is when some nodes are not fully
+connected and cannot communicate directly to each other but information can be disseminated between them via
+other nodes.
+
+The Akka cluster has a failure detector that will notice network partitions and machine crashes (but it
+cannot distinguish the two). It uses periodic heartbeat messages to check if other nodes are available
+and healthy. These observations by the failure detector are referred to as a node being *unreachable*
+and it may become *reachable* again if the failure detector observes that it can communicate with it again.  
+
+The failure detector in itself is not enough for making the right decision in all situations.
+The naive approach is to remove an unreachable node from the cluster membership after a timeout.
+This works great for crashes and short transient network partitions, but not for long network
+partitions. Both sides of the network partition will see the other side as unreachable and
+after a while remove it from its cluster membership. Since this happens on both sides the result
+is that two separate disconnected clusters have been created.
+This approach is provided by the opt-in (off by default) auto-down feature in the OSS version of
+Akka Cluster.
+
+If you use the timeout based auto-down feature in combination with Cluster Singleton or Cluster Sharding
+that would mean that two singleton instances or two sharded entities with the same identifier would be running.
+One would be running: one in each cluster.
+For example when used together with Akka Persistence that could result in that two instances of a
+persistent actor with the same `persistenceId` are running and writing concurrently to the
+same stream of persistent events, which will have fatal consequences when replaying these events.
+
+The default setting in Akka Cluster is to not remove unreachable nodes automatically and
+the recommendation is that the decision of what to
+do should be taken by a human operator or an external monitoring system. This is a valid solution,
+but not very convenient if you do not have this staff or external system for other reasons.
+
+If the unreachable nodes are not downed at all they will still be part of the cluster membership.
+Meaning that Cluster Singleton and Cluster Sharding will not failover to another node. While there
+are unreachable nodes new nodes that are joining the cluster will not be promoted to full worthy
+members (with status Up). Similarly, leaving members will not be removed until all unreachable
+nodes have been resolved. In other words, keeping unreachable members for an unbounded time is
+undesirable.
+
+With that introduction of the problem domain, it is time to look at the provided strategies for
+handling network partition, unresponsive nodes and crashed nodes.
+
+## Strategies
+
+By default the @ref:[Keep Majority](#keep-majority) strategy will be used because it works well for
+most systems. However, it's wort considering the other available strategies and pick a strategy that fits
+the characteristics of your system. For example, in a Kubernetes environment the @ref:[Lease](#lease) strategy
+can be a good choice. 
+
+Every strategy has a failure scenario where it makes a "wrong" decision. This section describes the different
+strategies and guidelines of when to use what.
+
+When there is uncertainty it selects to down more nodes than necessary, or even downing of all nodes.
+Therefore Split Brain Resolver should always be combined with a mechanism to automatically start up nodes that
+have been shutdown, and join them to the existing cluster or form a new cluster again.
+
+You enable a strategy with the configuration property `akka.cluster.split-brain-resolver.active-strategy`.
+
+### Stable after
+
+All strategies are inactive until the cluster membership and the information about unreachable nodes
+have been stable for a certain time period. Continuously adding more nodes while there is a network
+partition does not influence this timeout, since the status of those nodes will not be changed to Up
+while there are unreachable nodes. Joining nodes are not counted in the logic of the strategies. 
+
+@@snip [reference.conf](/akka-cluster/src/main/resources/reference.conf) { #split-brain-resolver }
+
+Set `akka.cluster.split-brain-resolver.stable-after` to a shorter duration to have quicker removal of crashed nodes,
+at the price of risking too early action on transient network partitions that otherwise would have healed. Do not
+set this to a shorter duration than the membership dissemination time in the cluster, which depends
+on the cluster size. Recommended minimum duration for different cluster sizes:
+
+|cluster size | stable-after|
+|-------------|-------------|
+|5    | 7 s |
+|10   | 10 s|
+|20   | 13 s|
+|50   | 17 s|
+|100  | 20 s|
+|1000 | 30 s|
+
+The different strategies may have additional settings that are described below.
+
+@@@ note
+
+It is important that you use the same configuration on all nodes.
+
+@@@
+
+The side of the split that decides to shut itself down will use the cluster *down* command
+to initiate the removal of a cluster member. When that has been spread among the reachable nodes
+it will be removed from the cluster membership.
+
+It's good to terminate the `ActorSystem` and exit the JVM when the node is removed from the cluster.
+
+That is handled by @ref:[Coordinated Shutdown](coordinated-shutdown.md)
+but to exit the JVM it's recommended that you enable:
+
+```
+akka.coordinated-shutdown.exit-jvm = on
+```
+
+@@@ note
+
+Some legacy containers may block calls to System.exit(..) and you may have to find an alternate
+way to shut the app down. For example, when running Akka on top of a Spring / Tomcat setup, you
+could replace the call to `System.exit(..)` with a call to Spring's ApplicationContext .close() method
+(or with a HTTP call to Tomcat Manager's API to un-deploy the app).
+
+@@@
+
+### Keep Majority
+
+The strategy named `keep-majority` will down the unreachable nodes if the current node is in
+the majority part based on the last known membership information. Otherwise down the reachable nodes,
+i.e. the own part. If the parts are of equal size the part containing the node with the lowest
+address is kept.
+
+This strategy is a good choice when the number of nodes in the cluster change dynamically and you can
+therefore not use `static-quorum`.
+
+This strategy also handles the edge case that may occur when there are membership changes at the same
+time as the network partition occurs. For example, the status of two members are changed to `Up`
+on one side but that information is not disseminated to the other side before the connection is broken.
+Then one side sees two more nodes and both sides might consider themselves having a majority. It will
+detect this situation and make the safe decision to down all nodes on the side that could be in minority
+if the joining nodes were changed to `Up` on the other side. Note that this has the drawback that
+if the joining nodes were not changed to `Up` and becoming a majority on the other side then each part
+will shut down itself, terminating the whole cluster.
+
+Note that if there are more than two partitions and none is in majority each part will shut down
+itself, terminating the whole cluster.
+
+If more than half of the nodes crash at the same time the other running nodes will down themselves
+because they think that they are not in majority, and thereby the whole cluster is terminated.  
+
+The decision can be based on nodes with a configured `role` instead of all nodes in the cluster.
+This can be useful when some types of nodes are more valuable than others. You might for example
+have some nodes responsible for persistent data and some nodes with stateless worker services.
+Then it probably more important to keep as many persistent data nodes as possible even though
+it means shutting down more worker nodes.
+
+Configuration:
+
+```
+akka.cluster.split-brain-resolver.active-strategy=keep-majority
+```
+
+@@snip [reference.conf](/akka-cluster/src/main/resources/reference.conf) { #keep-majority }
+
+### Static Quorum
+
+The strategy named `static-quorum` will down the unreachable nodes if the number of remaining
+nodes are greater than or equal to a configured `quorum-size`. Otherwise, it will down the reachable nodes,
+i.e. it will shut down that side of the partition. In other words, the `quorum-size` defines the minimum
+number of nodes that the cluster must have to be operational.
+
+This strategy is a good choice when you have a fixed number of nodes in the cluster, or when you can
+define a fixed number of nodes with a certain role.
+
+For example, in a 9 node cluster you will configure the `quorum-size` to 5. If there is a network split
+of 4 and 5 nodes the side with 5 nodes will survive and the other 4 nodes will be downed. After that,
+in the 5 node cluster, no more failures can be handled, because the remaining cluster size would be
+less than 5. In the case of another failure in that 5 node cluster all nodes will be downed.
+
+Therefore it is important that you join new nodes when old nodes have been removed.
+
+Another consequence of this is that if there are unreachable nodes when starting up the cluster,
+before reaching this limit, the cluster may shut itself down immediately. This is not an issue
+if you start all nodes at approximately the same time or use the `akka.cluster.min-nr-of-members`
+to define required number of members before the leader changes member status of 'Joining' members to 'Up'
+You can tune the timeout after which downing decisions are made using the `stable-after` setting.
+
+You should not add more members to the cluster than **quorum-size * 2 - 1**. A warning is logged
+if this recommendation is violated. If the exceeded cluster size remains when a SBR decision is
+needed it will down all nodes because otherwise there is a risk that both sides may down each
+other and thereby form two separate clusters.
+
+For rolling updates it's best to leave the cluster gracefully via
+@ref:[Coordinated Shutdown](coordinated-shutdown.md) (SIGTERM).
+For successful leaving SBR will not be used (no downing) but if there is an unreachability problem
+at the same time as the rolling update is in progress there could be an SBR decision. To avoid that
+the total number of members limit is not exceeded during the rolling update it's recommended to
+leave and fully remove one node before adding a new one, when using `static-quorum`.
+
+If the cluster is split into 3 (or more) parts each part that is smaller than then configured `quorum-size`
+will down itself and possibly shutdown the whole cluster.
+
+If more nodes than the configured `quorum-size` crash at the same time the other running nodes
+will down themselves because they think that they are not in the majority, and thereby the whole
+cluster is terminated.
+
+The decision can be based on nodes with a configured `role` instead of all nodes in the cluster.
+This can be useful when some types of nodes are more valuable than others. You might, for example,
+have some nodes responsible for persistent data and some nodes with stateless worker services.
+Then it probably more important to keep as many persistent data nodes as possible even though
+it means shutting down more worker nodes.
+
+There is another use of the `role` as well. By defining a `role` for a few (e.g. 7) stable
+nodes in the cluster and using that in the configuration of `static-quorum` you will be able
+to dynamically add and remove other nodes without this role and still have good decisions of what
+nodes to keep running and what nodes to shut down in the case of network partitions. The advantage
+of this approach compared to `keep-majority` (described below) is that you *do not* risk splitting
+the cluster into two separate clusters, i.e. *a split brain**. You must still obey the rule of not
+starting too many nodes with this `role` as described above. It also suffers the risk of shutting
+down all nodes if there is a failure when there are not enough nodes with this `role` remaining 
+in the cluster, as described above.
+
+Configuration:
+
+```
+akka.cluster.split-brain-resolver.active-strategy=static-quorum
+```
+
+@@snip [reference.conf](/akka-cluster/src/main/resources/reference.conf) { #static-quorum }
+
+### Keep Oldest
+
+The strategy named `keep-oldest` will down the part that does not contain the oldest
+member. The oldest member is interesting because the active Cluster Singleton instance
+is running on the oldest member.
+
+There is one exception to this rule if `down-if-alone` is configured to `on`.
+Then, if the oldest node has partitioned from all other nodes the oldest will down itself
+and keep all other nodes running. The strategy will not down the single oldest node when
+it is the only remaining node in the cluster.
+
+Note that if the oldest node crashes the others will remove it from the cluster
+when `down-if-alone` is `on`, otherwise they will down themselves if the
+oldest node crashes, i.e. shut down the whole cluster together with the oldest node.
+
+This strategy is good to use if you use Cluster Singleton and do not want to shut down the node
+where the singleton instance runs. If the oldest node crashes a new singleton instance will be
+started on the next oldest node. The drawback is that the strategy may keep only a few nodes
+in a large cluster. For example, if one part with the oldest consists of 2 nodes and the
+other part consists of 98 nodes then it will keep 2 nodes and shut down 98 nodes.
+
+This strategy also handles the edge case that may occur when there are membership changes at the same
+time as the network partition occurs. For example, the status of the oldest member is changed to `Exiting`
+on one side but that information is not disseminated to the other side before the connection is broken.
+It will detect this situation and make the safe decision to down all nodes on the side that sees the oldest as
+`Leaving`. Note that this has the drawback that if the oldest was `Leaving` and not changed to `Exiting` then
+each part will shut down itself, terminating the whole cluster.
+
+The decision can be based on nodes with a configured `role` instead of all nodes in the cluster,
+i.e. using the oldest member (singleton) within the nodes with that role.
+
+Configuration:
+
+```
+akka.cluster.split-brain-resolver.active-strategy=keep-oldest
+```
+
+@@snip [reference.conf](/akka-cluster/src/main/resources/reference.conf) { #keep-oldest }
+
+### Down All
+
+The strategy named `down-all` will down all nodes.
+
+This strategy can be a safe alternative if the network environment is highly unstable with unreachability observations
+that can't be fully trusted, and including frequent occurrences of @ref:[indirectly connected nodes](#indirectly-connected-nodes).
+Due to the instability there is an increased risk of different information on different sides of partitions and
+therefore the other strategies may result in conflicting decisions. In such environments it can be better to shutdown
+all nodes and start up a new fresh cluster.
+
+Shutting down all nodes means that the system will be completely unavailable until nodes have been restarted and
+formed a new cluster. This strategy is not recommended for large clusters (> 10 nodes) because any minor problem
+will shutdown all nodes, and that is more likely to happen in larger clusters since there are more nodes that
+may fail.
+
+See also @ref[Down all when unstable](#down-all-when-unstable) and @ref:[indirectly connected nodes](#indirectly-connected-nodes).
+
+### Lease
+
+The strategy named `lease-majority` is using a distributed lease (lock) to decide what nodes that are allowed to
+survive. Only one SBR instance can acquire the lease make the decision to remain up. The other side will
+not be able to aquire the lease and will therefore down itself.
+
+Best effort is to keep the side that has most nodes, i.e. the majority side. This is achieved by adding a delay
+before trying to acquire the lease on the minority side.
+
+There is currently one supported implementation of the lease which is backed by a
+[Custom Resource Definition (CRD)](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)
+in Kubernetes. It is described in the [Kubernetes Lease](https://doc.akka.io/docs/akka-management/current/kubernetes-lease.html)
+documentation.
+
+This strategy is very safe since coordination is added by an external arbiter. The trade-off compared to other
+strategies is that it requires additional infrastructure for implementing the lease and it reduces the availability
+of a decision to that of the system backing the lease store.
+
+Similar to other strategies it is important that decisions are not deferred for too because the nodes that couldn't
+acquire the lease must decide to down themselves, see @ref[Down all when unstable](#down-all-when-unstable).
+
+In some cases the lease will be unavailable when needed for a decision from all SBR instances, e.g. because it is
+on another side of a network partition, and then all nodes will be downed.
+
+Configuration:
+
+```
+akka {
+  cluster {
+    downing-provider-class = "akka.cluster.sbr.SplitBrainResolverProvider"
+    split-brain-resolver {
+      active-strategy = "lease-majority"
+      lease-majority {
+        lease-implementation = "akka.lease.kubernetes"
+      }
+    }
+  }
+}
+```
+
+@@snip [reference.conf](/akka-cluster/src/main/resources/reference.conf) { #lease-majority }
+
+See also configuration and additional dependency in [Kubernetes Lease](https://doc.akka.io/docs/akka-management/current/kubernetes-lease.html)
+
+## Indirectly connected nodes
+
+In a malfunctional network there can be situations where nodes are observed as unreachable via some network
+links but they are still indirectly connected via other nodes, i.e. it's not a clean network partition (or node crash).
+
+When this situation is detected the Split Brain Resolvers will keep fully connected nodes and down all the indirectly
+connected nodes.
+
+If there is a combination of indirectly connected nodes and a clean network partition it will combine the
+above decision with the ordinary decision, e.g. keep majority, after excluding suspicious failure detection
+observations.
+
+## Down all when unstable
+
+When reachability observations by the failure detector are changed the SBR decisions
+are deferred until there are no changes within the `stable-after` duration.
+If this continues for too long it might be an indication of an unstable system/network
+and it could result in delayed or conflicting decisions on separate sides of a network
+partition.
+
+As a precaution for that scenario all nodes are downed if no decision is made within
+`stable-after + down-all-when-unstable` from the first unreachability event.
+The measurement is reset if all unreachable have been healed, downed or removed, or
+if there are no changes within `stable-after * 2`.
+
+This is enabled by default for all strategies and by default the duration is derived to
+be 3/4 of `stable-after`.
+
+The below property can be defined as a duration of for how long the changes are acceptable to
+continue after the `stable-after` or it can be set to `off` to disable this feature.
+
+
+```
+akka.cluster.split-brain-resolver {
+  down-all-when-unstable = 15s
+  stable-after = 20s
+}
+```
+
+@@@ warning
+
+It is recommended to keep `down-all-when-unstable` enabled and not set it to a longer duration than `stable-after`
+(`down-removal-margin`) because that can result in delayed decisions on the side that should have been downed, e.g.
+in the case of a clean network partition followed by continued instability on the side that should be downed.
+That could result in that members are removed from one side but are still running on the other side.
+
+@@@
+
+## Multiple data centers
+
+Akka Cluster has @ref:[support for multiple data centers](cluster-dc.md), where the cluster
+membership is managed by each data center separately and independently of network partitions across different
+data centers. The Split Brain Resolver is embracing that strategy and will not count nodes or down nodes in
+another data center.
+
+When there is a network partition across data centers the typical solution is to wait the partition out until it heals, i.e.
+do nothing. Other decisions should be performed by an external monitoring tool or human operator.
+
+## Cluster Singleton and Cluster Sharding
+
+The purpose of Cluster Singleton and Cluster Sharding is to run at most one instance
+of a given actor at any point in time. When such an instance is shut down a new instance
+is supposed to be started elsewhere in the cluster. It is important that the new instance is
+not started before the old instance has been stopped. This is especially important when the
+singleton or the sharded instance is persistent, since there must only be one active
+writer of the journaled events of a persistent actor instance.
+
+Since the strategies on different sides of a network partition cannot communicate with each other
+and they may take the decision at slightly different points in time there must be a time based
+margin that makes sure that the new instance is not started before the old has been stopped.
+
+You would like to configure this to a short duration to have quick failover, but that will increase the
+risk of having multiple singleton/sharded instances running at the same time and it may take a different
+amount of time to act on the decision (dissemination of the down/removal). The duration is by default
+the same as the `stable-after` property (see @ref:[Stable after](#stable-after) above). It is recommended to
+leave this value as is, but it can also be separately overriden with the `akka.cluster.down-removal-margin` property.
+
+Another concern for setting this `stable-after`/`akka.cluster.down-removal-margin` is dealing with JVM pauses e.g.
+garbage collection. When a node is unresponsive it is not known if it is due to a pause, overload, a crash or a 
+network partition. If it is pause that lasts longer than `stable-after` * 2 it gives time for SBR to down the node
+and for singletons and shards to be started on other nodes. When the node un-pauses there will be a short time before
+it sees its self as down where singletons and sharded actors are still running. It is therefore important to understand
+the max pause time your application is likely to incur and make sure it is smaller than `stable-margin`.
+
+If you choose to set a separate value for `down-removal-margin`, the recommended minimum duration for different cluster sizes are:
+
+|cluster size | down-removal-margin|
+|-------------|--------------------|
+|5    | 7 s |
+|10   | 10 s|
+|20   | 13 s|
+|50   | 17 s|
+|100  | 20 s|
+|1000 | 30 s|
+
+### Expected Failover Time
+
+As you have seen, there are several configured timeouts that add to the total failover latency.
+With default configuration those are:
+
+ * failure detection 5 seconds
+ * stable-after 20 seconds
+ * down-removal-margin (by default the same as stable-after) 20 seconds
+
+In total, you can expect the failover time of a singleton or sharded instance to be around 45 seconds
+with default configuration. The default configuration is sized for a cluster of 100 nodes. If you have
+around 10 nodes you can reduce the `stable-after` to around 10 seconds, 
+resulting in an expected failover time of around 25 seconds.   
--- a/akka-docs/src/main/paradox/typed/cluster-membership.md
+++ b/akka-docs/src/main/paradox/typed/cluster-membership.md
@ -108,8 +108,7 @@ performs in such a case must be designed in a way that all concurrent leaders wo
 might be impossible in general and only feasible under additional constraints). The most important case of that kind is a split
 brain scenario where nodes need to be downed, either manually or automatically, to bring the cluster back to convergence.

-See the [Lightbend Split Brain Resolver](https://doc.akka.io/docs/akka-enhancements/current/split-brain-resolver.html)
-for an implementation of that.
+The @ref:[Split Brain Resolver](../split-brain-resolver.md) is the built-in implementation of that.

 Another transition that is possible without convergence is marking members as `WeaklyUp` as described in the next section.

@ -140,7 +139,7 @@ startup if a node to join have been specified in the configuration
   
 * **leave** - tell a node to leave the cluster gracefully, normally triggered by ActorSystem or JVM shutdown through @ref[coordinated shutdown](../coordinated-shutdown.md)
   
- * **down** - mark a node as down. This action is required to remove crashed nodes (that did not 'leave') from the cluster. It can be triggered manually, through [Cluster HTTP Management](https://doc.akka.io/docs/akka-management/current/cluster-http-management.html#put-cluster-members-address-responses), or automatically by a @ref[downing provider](cluster.md#downing) like [Split Brain Resolver](https://doc.akka.io/docs/akka-enhancements/current/split-brain-resolver.html)
+ * **down** - mark a node as down. This action is required to remove crashed nodes (that did not 'leave') from the cluster. It can be triggered manually, through [Cluster HTTP Management](https://doc.akka.io/docs/akka-management/current/cluster-http-management.html#put-cluster-members-address-responses), or automatically by a @ref[downing provider](cluster.md#downing) like @ref:[Split Brain Resolver](../split-brain-resolver.md)
   
 #### Leader Actions

--- a/akka-docs/src/main/paradox/typed/cluster.md
+++ b/akka-docs/src/main/paradox/typed/cluster.md
@ -275,23 +275,22 @@ new joining members to 'Up'. The node must first become `reachable` again, or th
 status of the unreachable member must be changed to `Down`. Changing status to `Down`
 can be performed automatically or manually.

-By default, downing must be performed manually using @ref:[HTTP](../additional/operations.md#http) or @ref:[JMX](../additional/operations.md#jmx).
+We recommend that you enable the @ref:[Split Brain Resolver](../split-brain-resolver.md) that is part of the
+Akka Cluster module. You enable it with configuration:
+
+```
+akka.cluster.downing-provider-class = "akka.cluster.sbr.SplitBrainResolverProvider"
+```
+
+You should also consider the different available @ref:[downing strategies](../split-brain-resolver.md#strategies).
+
+If a downing provider is not configured downing must be performed manually using 
+@ref:[HTTP](../additional/operations.md#http) or @ref:[JMX](../additional/operations.md#jmx).

 Note that @ref:[Cluster Singleton](cluster-singleton.md) or @ref:[Cluster Sharding entities](cluster-sharding.md) that
 are running on a crashed (unreachable) node will not be started on another node until the previous node has
 been removed from the Cluster. Removal of crashed (unreachable) nodes is performed after a downing decision.

-A production solution for downing is provided by
-[Split Brain Resolver](https://doc.akka.io/docs/akka-enhancements/current/split-brain-resolver.html),
-which is part of the [Akka Platform](https://www.lightbend.com/akka-platform).
-If you don’t have a Lightbend Subscription, you should still carefully read the 
-[documentation](https://doc.akka.io/docs/akka-enhancements/current/split-brain-resolver.html)
-of the Split Brain Resolver and make sure that the solution you are using handles the concerns and scenarios
-described there.
-
-A custom downing strategy can be implemented with a @apidoc[akka.cluster.DowningProvider] and enabled with
-configuration `akka.cluster.downing-provider-class`.  
-
 Downing can also be performed programmatically with @scala[`Cluster(system).manager ! Down(address)`]@java[`Cluster.get(system).manager().tell(Down(address))`],
 but that is mostly useful from tests and when implementing a `DowningProvider`.

--- a/akka-docs/src/main/paradox/typed/guide/modules.md
+++ b/akka-docs/src/main/paradox/typed/guide/modules.md
@ -18,7 +18,6 @@ With a [Lightbend Platform Subscription](https://www.lightbend.com/lightbend-sub

 [Akka Resilience Enhancements](https://doc.akka.io/docs/akka-enhancements/current/akka-resilience-enhancements.html):

-* [Split Brain Resolver](https://doc.akka.io/docs/akka-enhancements/current/split-brain-resolver.html) &#8212; Detects and recovers from network partitions, eliminating data inconsistencies and possible downtime.
 * [Configuration Checker](https://doc.akka.io/docs/akka-enhancements/current/config-checker.html) &#8212; Checks for potential configuration issues and logs suggestions.
 * [Diagnostics Recorder](https://doc.akka.io/docs/akka-enhancements/current/diagnostics-recorder.html) &#8212; Captures configuration and system information in a format that makes it easy to troubleshoot issues during development and production.
 * [Thread Starvation Detector](https://doc.akka.io/docs/akka-enhancements/current/starvation-detector.html) &#8212; Monitors an Akka system dispatcher and logs warnings if it becomes unresponsive.
--- a/akka-docs/src/main/paradox/typed/index-cluster.md
+++ b/akka-docs/src/main/paradox/typed/index-cluster.md
@ -25,6 +25,7 @@ project.description: Akka Cluster concepts, node membership service, CRDT Distri
 * [multi-node-testing](../multi-node-testing.md)
 * [remoting-artery](../remoting-artery.md)
 * [remoting](../remoting.md)
+* [split-brain-resolver](../split-brain-resolver.md)
 * [coordination](../coordination.md)
 * [choosing-cluster](choosing-cluster.md)

--- a/akka-docs/src/test/java/jdocs/coordination/LeaseDocTest.java
+++ b/akka-docs/src/test/java/jdocs/coordination/LeaseDocTest.java
@ -0,0 +1,105 @@
+/*
+ * Copyright (C) 2019-2020 Lightbend Inc. <https://www.lightbend.com>
+ */
+
+package jdocs.coordination;
+
+import akka.actor.ActorSystem;
+import akka.coordination.lease.LeaseSettings;
+import akka.coordination.lease.javadsl.Lease;
+import akka.coordination.lease.javadsl.LeaseProvider;
+import akka.testkit.javadsl.TestKit;
+import docs.coordination.LeaseDocSpec;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.scalatestplus.junit.JUnitSuite;
+
+import java.util.Optional;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.CompletionStage;
+import java.util.function.Consumer;
+
+@SuppressWarnings("unused")
+public class LeaseDocTest extends JUnitSuite {
+  // #lease-example
+  static class SampleLease extends Lease {
+
+    private LeaseSettings settings;
+
+    public SampleLease(LeaseSettings settings) {
+      this.settings = settings;
+    }
+
+    @Override
+    public LeaseSettings getSettings() {
+      return settings;
+    }
+
+    @Override
+    public CompletionStage<Boolean> acquire() {
+      return CompletableFuture.completedFuture(true);
+    }
+
+    @Override
+    public CompletionStage<Boolean> acquire(Consumer<Optional<Throwable>> leaseLostCallback) {
+      return CompletableFuture.completedFuture(true);
+    }
+
+    @Override
+    public CompletionStage<Boolean> release() {
+      return CompletableFuture.completedFuture(true);
+    }
+
+    @Override
+    public boolean checkLease() {
+      return true;
+    }
+  }
+  // #lease-example
+
+  private static ActorSystem system;
+
+  @BeforeClass
+  public static void setup() {
+    system = ActorSystem.create("LeaseDocTest", LeaseDocSpec.config());
+  }
+
+  @AfterClass
+  public static void teardown() {
+    TestKit.shutdownActorSystem(system);
+    system = null;
+  }
+
+  private void doSomethingImportant(Optional<Throwable> leaseLostReason) {}
+
+  @Test
+  public void javaLeaseBeLoadableFromJava() {
+    // #lease-usage
+    Lease lease =
+        LeaseProvider.get(system).getLease("<name of the lease>", "jdocs-lease", "<owner name>");
+    CompletionStage<Boolean> acquired = lease.acquire();
+    boolean stillAcquired = lease.checkLease();
+    CompletionStage<Boolean> released = lease.release();
+    // #lease-usage
+
+    // #lost-callback
+    lease.acquire(this::doSomethingImportant);
+    // #lost-callback
+
+    // #cluster-owner
+    // String owner = Cluster.get(system).selfAddress().hostPort();
+    // #cluster-owner
+
+  }
+
+  @Test
+  public void scalaLeaseBeLoadableFromJava() {
+    Lease lease =
+        LeaseProvider.get(system).getLease("<name of the lease>", "docs-lease", "<owner name>");
+    CompletionStage<Boolean> acquired = lease.acquire();
+    boolean stillAcquired = lease.checkLease();
+    CompletionStage<Boolean> released = lease.release();
+    lease.acquire(this::doSomethingImportant);
+  }
+}
--- a/akka-docs/src/test/scala/docs/coordination/LeaseDocSpec.scala
+++ b/akka-docs/src/test/scala/docs/coordination/LeaseDocSpec.scala
@ -0,0 +1,102 @@
+/*
+ * Copyright (C) 2019-2020 Lightbend Inc. <https://www.lightbend.com>
+ */
+
+package docs.coordination
+
+import scala.concurrent.Future
+
+import com.typesafe.config.ConfigFactory
+
+import akka.cluster.Cluster
+import akka.coordination.lease.LeaseSettings
+import akka.coordination.lease.scaladsl.Lease
+import akka.coordination.lease.scaladsl.LeaseProvider
+import akka.testkit.AkkaSpec
+
+//#lease-example
+class SampleLease(settings: LeaseSettings) extends Lease(settings) {
+
+  override def acquire(): Future[Boolean] = {
+    Future.successful(true)
+  }
+
+  override def acquire(leaseLostCallback: Option[Throwable] => Unit): Future[Boolean] = {
+    Future.successful(true)
+  }
+
+  override def release(): Future[Boolean] = {
+    Future.successful(true)
+  }
+
+  override def checkLease(): Boolean = {
+    true
+  }
+}
+//#lease-example
+
+object LeaseDocSpec {
+
+  def config() =
+    ConfigFactory.parseString("""
+      jdocs-lease.lease-class = "jdocs.coordination.LeaseDocTest$SampleLease"
+      #lease-config
+      akka.actor.provider = cluster
+      docs-lease {
+        lease-class = "docs.coordination.SampleLease"
+        heartbeat-timeout = 100s
+        heartbeat-interval = 1s
+        lease-operation-timeout = 1s
+        # Any lease specific configuration
+      }
+      #lease-config
+    """.stripMargin)
+
+  def blackhole(stuff: Any*): Unit = {
+    stuff.toString
+    ()
+  }
+  def doSomethingImportant(leaseLostReason: Option[Throwable]): Unit = {
+    leaseLostReason.map(_.toString)
+    ()
+  }
+}
+
+class LeaseDocSpec extends AkkaSpec(LeaseDocSpec.config) {
+  import LeaseDocSpec._
+
+  "A docs lease" should {
+    "scala lease be loadable from scala" in {
+
+      //#lease-usage
+      val lease = LeaseProvider(system).getLease("<name of the lease>", "docs-lease", "owner")
+      val acquired: Future[Boolean] = lease.acquire()
+      val stillAcquired: Boolean = lease.checkLease()
+      val released: Future[Boolean] = lease.release()
+      //#lease-usage
+
+      //#lost-callback
+      lease.acquire(leaseLostReason => doSomethingImportant(leaseLostReason))
+      //#lost-callback
+
+      //#cluster-owner
+      val owner = Cluster(system).selfAddress.hostPort
+      //#cluster-owner
+
+      // remove compiler warnings
+      blackhole(acquired, stillAcquired, released, owner)
+
+    }
+
+    "java lease be loadable from scala" in {
+      val lease = LeaseProvider(system).getLease("<name of the lease>", "jdocs-lease", "owner")
+      val acquired: Future[Boolean] = lease.acquire()
+      val stillAcquired: Boolean = lease.checkLease()
+      val released: Future[Boolean] = lease.release()
+      lease.acquire(leaseLostReason => doSomethingImportant(leaseLostReason))
+
+      blackhole(acquired, stillAcquired, released)
+    }
+  }
+
+}