Artery docs for Quarantine, #21209

This commit is contained in:
Patrik Nordwall 2016-09-29 17:09:21 +02:00
parent 577f43233a
commit ce33fec7e1
2 changed files with 51 additions and 4 deletions

View file

@ -281,11 +281,58 @@ untrusted mode when incoming via the remoting layer:
marking them :class:`PossiblyHarmful` so that a client cannot forge them.
Lifecycle and Failure Recovery Model
------------------------------------
Quarantine
----------
TODO
Akka remoting is using Aeron as underlying message transport. Aeron is using UDP and adds
among other things reliable delivery and session semantics, very similar to TCP. This means that
the order of the messages are preserved, which is needed for the :ref:`Actor message ordering guarantees <message-ordering>`.
Under normal circumstances all messages will be delivered but there are cases when messages
may not be delivered to the destination:
* during a network partition and the Aeron session is broken, this automatically recovered once the partition is over
* when sending too many messages without flow control and thereby filling up the outbound send queue (``outbound-message-queue-size`` config)
* if serialization or deserialization of a message fails (only that message will be dropped)
* if an unexpected exception occurs in the remoting infrastructure
In short, Actor message delivery is “at-most-once” as described in :ref:`message-delivery-reliability`
Some messages in Akka are called system messages and those cannot be dropped because that would result
in an inconsistent state between the systems. Such messages are used for essentially two features; remote death
watch and remote deployment. These messages are delivered by Akka remoting with “exactly-once” guarantee by
confirming each message and resending unconfirmed messages. If a system message anyway cannot be delivered the
association with the destination system is irrecoverable failed, and Terminated is signaled for all watched
actors on the remote system. It is placed in a so called quarantined state. Quarantine usually does not
happen if remote watch or remote deployment is not used.
Each ``ActorSystem`` instance has an unique identifier (UID), which is important for differentiating between
incarnations of a system when it is restarted with the same hostname and port. It is the specific
incarnation (UID) that is quarantined. The only way to recover from this state is to restart one of the
actor systems.
Messages that are sent to and received from a quarantined system will be dropped. However, it is possible to
send messages with ``actorSelection`` to the address of a quarantined system, which is useful to probe if the
system has been restarted.
An association will be quarantined when:
* Cluster node is removed from the cluster membership.
* Remote failure detector triggers, i.e. remote watch is used. This is different when :ref:`Akka Cluster <cluster_usage_scala>`
is used. The unreachable observation by the cluster failure detector can go back to reachable if the network
partition heals. A cluster member is not quarantined when the failure detector triggers.
* Overflow of the system message delivery buffer, e.g. because of too many ``watch`` requests at the same time
(``system-message-buffer-size`` config).
* Unexpected exception occurs in the control subchannel of the remoting infrastructure.
The UID of the ``ActorSystem`` is exchanged in a two-way handshake when the first message is sent to
a destination. The handshake will be retried until the other system replies and no other messages will
pass through until the handshake is completed. If the handshake cannot be established within a timeout
(``handshake-timeout`` config) the association is stopped (freeing up resources). Queued messages will be
dropped if the handshake cannot be established. It will not be quarantined, because the UID is unknown.
New handshake attempt will start when next message is sent to the destination.
Handshake requests are actually also sent periodically to be able to establish a working connection
when the destination system has been restarted.
Watching Remote Actors
^^^^^^^^^^^^^^^^^^^^^^

View file

@ -579,7 +579,7 @@ The ``TestProbe`` class can in fact create actors that will run with the test pr
This will cause any messages the the child actor sends to `context.parent` to
end up in the test probe.
.. includecode:: code/docs/testkit/ParentChildSpec.scala##test-TestProbe-parent
.. includecode:: code/docs/testkit/ParentChildSpec.scala#test-TestProbe-parent
Using a fabricated parent
^^^^^^^^^^^^^^^^^^^^^^^^^