Clarify system name requirement for cluster members

* Clarify system name requirement for cluster members * Recommend againsts auto-down, stronger
2016-04-04 12:37:12 +02:00 · 2016-04-04 12:37:12 +02:00 · 52de0bcaa4
commit 52de0bcaa4
parent aec81c2ac2
3 changed files with 51 additions and 7 deletions
--- a/akka-docs/rst/java/cluster-usage.rst
+++ b/akka-docs/rst/java/cluster-usage.rst
@ -119,6 +119,10 @@ after the restart, when it come up as new incarnation of existing member in the
 trying to join in, then the existing one will be removed from the cluster and then it will
 be allowed to join.

+.. note::
+
+  The name of the ``ActorSystem`` must be the same for all members of a cluster. The name is given
+  when you start the ``ActorSystem``.

 .. _automatic-vs-manual-downing-java:

@ -141,9 +145,25 @@ You can enable automatic downing with configuration::
 This means that the cluster leader member will change the ``unreachable`` node
 status to ``down`` automatically after the configured time of unreachability.

-Be aware of that using auto-down implies that two separate clusters will
-automatically be formed in case of network partition. That might be
-desired by some applications but not by others.
+This is a naïve approach to remove unreachable nodes from the cluster membership. It
+works great for crashes and short transient network partitions, but not for long network
+partitions. Both sides of the network partition will see the other side as unreachable 
+and after a while remove it from its cluster membership. Since this happens on both
+sides the result is that two separate disconnected clusters have been created. This
+can also happen because of long GC pauses or system overload.
+
+.. warning::
+
+  We recommend against using the auto-down feature of Akka Cluster in production.
+  This is crucial for correct behavior if you use :ref:`cluster-singleton-java` or 
+  :ref:`cluster_sharding_java`, especially together with Akka :ref:`persistence-java`.
+  
+A pre-packaged solution for the downing problem is provided by 
+`Split Brain Resolver <http://doc.akka.io/docs/akka/rp-16s01p03/java/split-brain-resolver.html>`_, 
+which is part of the Lightbend Reactive Platform. If you don’t use RP, you should anyway carefully 
+read the `documentation <http://doc.akka.io/docs/akka/rp-16s01p03/java/split-brain-resolver.html>`_
+of the Split Brain Resolver and make sure that the solution you are using handles the concerns 
+described there.

 .. note:: If you have *auto-down* enabled and the failure detector triggers, you
   can over time end up with a lot of single node clusters if you don't put
@ -372,7 +392,7 @@ Publish-subscribe messaging between actors in the cluster, and point-to-point me
 using the logical path of the actors, i.e. the sender does not have to know on which
 node the destination actor is running.

-See :ref:`distributed-pub-sub-scala`.
+See :ref:`distributed-pub-sub-java`.

 Cluster Client
 ^^^^^^^^^^^^^^
--- a/akka-docs/rst/scala/cluster-usage.rst
+++ b/akka-docs/rst/scala/cluster-usage.rst
@ -114,6 +114,11 @@ after the restart, when it come up as new incarnation of existing member in the
 trying to join in, then the existing one will be removed from the cluster and then it will
 be allowed to join.

+.. note::
+
+  The name of the ``ActorSystem`` must be the same for all members of a cluster. The name is given
+  when you start the ``ActorSystem``.
+
 .. _automatic-vs-manual-downing-scala:

 Automatic vs. Manual Downing
@ -135,9 +140,25 @@ You can enable automatic downing with configuration::
 This means that the cluster leader member will change the ``unreachable`` node
 status to ``down`` automatically after the configured time of unreachability.

-Be aware of that using auto-down implies that two separate clusters will
-automatically be formed in case of network partition. That might be
-desired by some applications but not by others.
+This is a naïve approach to remove unreachable nodes from the cluster membership. It
+works great for crashes and short transient network partitions, but not for long network
+partitions. Both sides of the network partition will see the other side as unreachable 
+and after a while remove it from its cluster membership. Since this happens on both
+sides the result is that two separate disconnected clusters have been created. This
+can also happen because of long GC pauses or system overload.
+
+.. warning::
+
+  We recommend against using the auto-down feature of Akka Cluster in production.
+  This is crucial for correct behavior if you use :ref:`cluster-singleton-scala` or 
+  :ref:`cluster_sharding_scala`, especially together with Akka :ref:`persistence-scala`.
+  
+A pre-packaged solution for the downing problem is provided by 
+`Split Brain Resolver <http://doc.akka.io/docs/akka/rp-16s01p03/scala/split-brain-resolver.html>`_, 
+which is part of the Lightbend Reactive Platform. If you don’t use RP, you should anyway carefully 
+read the `documentation <http://doc.akka.io/docs/akka/rp-16s01p03/scala/split-brain-resolver.html>`_
+of the Split Brain Resolver and make sure that the solution you are using handles the concerns 
+described there.

 .. note:: If you have *auto-down* enabled and the failure detector triggers, you
   can over time end up with a lot of single node clusters if you don't put