Merge pull request #20809 from 2m/wip-#20808-restart-node-2m

#20808 clarify docs on the quarantined node restart
This commit is contained in:
Martynas Mickevičius 2016-06-22 16:27:45 +03:00 committed by GitHub
commit e39255cef0
2 changed files with 18 additions and 18 deletions

View file

@ -147,7 +147,7 @@ status to ``down`` automatically after the configured time of unreachability.
This is a naïve approach to remove unreachable nodes from the cluster membership. It
works great for crashes and short transient network partitions, but not for long network
partitions. Both sides of the network partition will see the other side as unreachable
partitions. Both sides of the network partition will see the other side as unreachable
and after a while remove it from its cluster membership. Since this happens on both
sides the result is that two separate disconnected clusters have been created. This
can also happen because of long GC pauses or system overload.
@ -155,14 +155,14 @@ can also happen because of long GC pauses or system overload.
.. warning::
We recommend against using the auto-down feature of Akka Cluster in production.
This is crucial for correct behavior if you use :ref:`cluster-singleton-java` or
This is crucial for correct behavior if you use :ref:`cluster-singleton-java` or
:ref:`cluster_sharding_java`, especially together with Akka :ref:`persistence-java`.
A pre-packaged solution for the downing problem is provided by
`Split Brain Resolver <http://doc.akka.io/docs/akka/rp-16s01p03/java/split-brain-resolver.html>`_,
which is part of the Lightbend Reactive Platform. If you dont use RP, you should anyway carefully
A pre-packaged solution for the downing problem is provided by
`Split Brain Resolver <http://doc.akka.io/docs/akka/rp-16s01p03/java/split-brain-resolver.html>`_,
which is part of the Lightbend Reactive Platform. If you dont use RP, you should anyway carefully
read the `documentation <http://doc.akka.io/docs/akka/rp-16s01p03/java/split-brain-resolver.html>`_
of the Split Brain Resolver and make sure that the solution you are using handles the concerns
of the Split Brain Resolver and make sure that the solution you are using handles the concerns
described there.
.. note:: If you have *auto-down* enabled and the failure detector triggers, you
@ -427,8 +427,8 @@ If system messages cannot be delivered to a node it will be quarantined and then
cannot come back from ``unreachable``. This can happen if the there are too many
unacknowledged system messages (e.g. watch, Terminated, remote actor deployment,
failures of actors supervised by remote parent). Then the node needs to be moved
to the ``down`` or ``removed`` states and the actor system must be restarted before
it can join the cluster again.
to the ``down`` or ``removed`` states and the actor system of the quarantined node
must be restarted before it can join the cluster again.
The nodes in the cluster monitor each other by sending heartbeats to detect if a node is
unreachable from the rest of the cluster. The heartbeat arrival times is interpreted

View file

@ -142,7 +142,7 @@ status to ``down`` automatically after the configured time of unreachability.
This is a naïve approach to remove unreachable nodes from the cluster membership. It
works great for crashes and short transient network partitions, but not for long network
partitions. Both sides of the network partition will see the other side as unreachable
partitions. Both sides of the network partition will see the other side as unreachable
and after a while remove it from its cluster membership. Since this happens on both
sides the result is that two separate disconnected clusters have been created. This
can also happen because of long GC pauses or system overload.
@ -150,14 +150,14 @@ can also happen because of long GC pauses or system overload.
.. warning::
We recommend against using the auto-down feature of Akka Cluster in production.
This is crucial for correct behavior if you use :ref:`cluster-singleton-scala` or
This is crucial for correct behavior if you use :ref:`cluster-singleton-scala` or
:ref:`cluster_sharding_scala`, especially together with Akka :ref:`persistence-scala`.
A pre-packaged solution for the downing problem is provided by
`Split Brain Resolver <http://doc.akka.io/docs/akka/rp-16s01p03/scala/split-brain-resolver.html>`_,
which is part of the Lightbend Reactive Platform. If you dont use RP, you should anyway carefully
A pre-packaged solution for the downing problem is provided by
`Split Brain Resolver <http://doc.akka.io/docs/akka/rp-16s01p03/scala/split-brain-resolver.html>`_,
which is part of the Lightbend Reactive Platform. If you dont use RP, you should anyway carefully
read the `documentation <http://doc.akka.io/docs/akka/rp-16s01p03/scala/split-brain-resolver.html>`_
of the Split Brain Resolver and make sure that the solution you are using handles the concerns
of the Split Brain Resolver and make sure that the solution you are using handles the concerns
described there.
.. note:: If you have *auto-down* enabled and the failure detector triggers, you
@ -422,8 +422,8 @@ If system messages cannot be delivered to a node it will be quarantined and then
cannot come back from ``unreachable``. This can happen if the there are too many
unacknowledged system messages (e.g. watch, Terminated, remote actor deployment,
failures of actors supervised by remote parent). Then the node needs to be moved
to the ``down`` or ``removed`` states and the actor system must be restarted before
it can join the cluster again.
to the ``down`` or ``removed`` states and the actor system of the quarantined node
must be restarted before it can join the cluster again.
The nodes in the cluster monitor each other by sending heartbeats to detect if a node is
unreachable from the rest of the cluster. The heartbeat arrival times is interpreted