Adding additional details to the auto-downing section of the documentation to indicate the consequences of using it (#24697)

* Adding additional details to the auto-downing section of the documentation to indicate the consequences of using it * Slight rewording based on PR feedback.
2018-03-13 08:31:23 -06:00 · 2018-03-13 08:31:23 -06:00 · 95afa04a7a
commit 95afa04a7a
parent ab701e98be
1 changed files with 31 additions and 11 deletions
--- a/akka-docs/src/main/paradox/cluster-usage.md
+++ b/akka-docs/src/main/paradox/cluster-usage.md
@ -217,20 +217,40 @@ akka.cluster.auto-down-unreachable-after = 120s
 This means that the cluster leader member will change the `unreachable` node
 status to `down` automatically after the configured time of unreachability.

-This is a naïve approach to remove unreachable nodes from the cluster membership. It
-works great for crashes and short transient network partitions, but not for long network
-partitions. Both sides of the network partition will see the other side as unreachable
-and after a while remove it from its cluster membership. Since this happens on both
-sides the result is that two separate disconnected clusters have been created. This
-can also happen because of long GC pauses or system overload.
+This is a naïve approach to remove unreachable nodes from the cluster membership.
+It can be useful during development but in a production environment it will eventually breakdown the cluster. When a network partition occurs, both sides of the
+partition will see the other side as unreachable and remove it from the cluster.
+This results in the formation of two separate, disconnected, clusters 
+(known as *Split Brain*).
+
+This behaviour is not limited to network partitions. It can also occur if a node
+in the cluster is overloaded, or experiences a long GC pause.

@@@ warning

-We recommend against using the auto-down feature of Akka Cluster in production.
-This is crucial for correct behavior if you use @ref:[Cluster Singleton](cluster-singleton.md) or
-@ref:[Cluster Sharding](cluster-sharding.md), especially together with Akka @ref:[Persistence](persistence.md).
-For Akka Persistence with Cluster Sharding it can result in corrupt data in case
-of network partitions.
+We recommend against using the auto-down feature of Akka Cluster in production. It
+has multiple undesirable consequences for production systems.
+
+If you are using @ref:[Cluster Singleton](cluster-singleton.md) or
+@ref:[Cluster Sharding](cluster-sharding.md) it can break the contract provided by 
+those features. Both provide a guarantee that an actor will be unique in a cluster.
+With the auto-down feature enabled, it is possible for multiple independent clusters
+to form (*Split Brain*). When this happens the guaranteed uniqueness will no
+longer be true resulting in undesirable behaviour in the system.
+
+This is even more severe when @ref:[Akka Persistence](persistence.md) is used in
+conjunction with Cluster Sharding. In this case, the lack of unique actors can 
+cause multiple actors to write to the same journal. Akka Persistence operates on a
+single writer principle. Having multiple writers will corrupt the journal
+and make it unusable.
+
+Finally, even if you don't use features such as Persistence, Sharding, or Singletons, 
+auto-downing can lead the system to form multiple small clusters. These small
+clusters will be independent from each other. They will be unable to communicate
+and as a result you may experience performance degredation. Once this condition
+occurs, it will require manual intervention in order to reform the cluster.
+
+Because of these issues, auto-downing should **never** be used in a production environment.

@@@