* they can't be stopped immediately because we want to send
some final message and we reply to inbound messages with `Quarantined`
* and improve logging
This improves the remote watching mechanism as follows: Watch requests
are intercepted by the RemoteWatcher and not sent on the wire,
excepted watches from the remoteWatcher itself.
RemoteWatcher is then in charge of forwarding DeathWatchNotification
messages to the watchers.
This reduces the number of watch message to one per watchee, even if
there are several watcher on the same watchee (instead of n+1 before).
Reversed watch messages, and watch on ref with undefinedUid are excluded from
interception by the RemoteWatcher and so are handled as before this commit.
In addition, the following changes are made:
- Keep watchers in a map watchee -> watchers for more efficient retrieval
(in a scala Multimap)
- Keep watchees in a map address -> watchee for more efficient retrieval
(in a scala Multimap)
- Use of InternalActorRef more thoroughly to avoid casts
- Rewatch use a standard watch message, as the distinction is longer needed
* Replace stash with internal bufferi, j.u.LinkedList
* Replace FSM with become
* Adaptive backoff, important to backoff, but not for too long,
depends on environment and use case
* Prioritize heartbeat messages from remote watcher and cluster
failure detector
* Use payload messages as heartbeats for transport failure detector,
change transport failure detector to be based on absolute timeout,
see ticket #13989 and #13742
* Log remote disassociate from transport failure detector,
see ticket #13985
* Add benchmark sample in akka-sample-remote-scala
* unsubscribe in eventStream is too slow when using many actors
that are watching remote actors, and becomes a problem for example
when shutting down such a system
* Previous eventStream solution:
Stopping 20000 actors took 50355 ms
* This solution:
Stopping 20000 actors took 764 ms
* because it is not referentially transparent; normally we reserved parens for
side-effecting code but given how people thoughtlessly close over it we revised
that that decision for sender
* caller can still omit parens
* The problem scenario was that a remote watch followed by re-watch triggered by
first heartbeat and unwatch coming in before the extra re-watch message. That
caused RemoteWatcher to still watch the subject even though it was intended to
be unwatched.
* I could reproduce it with sleeps at stratgic points
* Sovled by separate re-watch message and check that still watching
* When actor system was restarted quickly the new system replied to
heartbeats and Terminated was never triggered for actors in old
system.
* Solved by sending an extra Watch system message when first hearbeat
is received for an address and when a change of system uid is detected.
* The problem was a race caused by HeartbeatReq sent out, and
the watchee terminated immediately. That caused the RemoteWatcher
peers watching each other without any other watch registered.
It is racy.
* Instead of one-way heartbeats from the side beeing watched I
changed to ping-pong style. That makes the problem go away
and simplifies a lot of things in RemoteWatcher.
* RemoteWatcher that monitors node failures, with heartbeats
and failure detector
* Move RemoteDeploymentWatcher from CARP to RARP
* ClusterRemoteWatcher that handles cluster nodes
* Update documentation
* UID in Heartbeat msg to be able to quarantine,
actual implementation of quarantining will be implemented
in ticket 2594