* fix NPE in shutdownTransport
* perhaps because shutdown before started
* system.dispatcher is used in other places of the shutdown
* improve logging of compression advertisment progress
* adjust RestartFlow.withBackoff parameters
* quarantine after ActorSystemTerminating signal
(will cleanup compressions)
* Quarantine idle associations
* liveness checks by sending extra HandshakeReq and update the
lastUsed when reply received
* concervative default value to survive network partition, in
case no other messages are sent
* Adjust logging and QuarantinedEvent for harmless quarantine
* Harmless if it was via the shutdown signal or cluster leaving
* Notice that the incarnation has changed in SystemMessageDelivery
and then reset the sequence number
* Take the incarnation number into account in the ClearSystemMessageDelivery
message
* Trigger quarantine earlier in ClusterRemoteWatcher if node with
same host:port joined
* Change quarantine-removed-node-after to 5s, shouldn't be necessary
to delay it 30s
* test reproducer
* they can't be stopped immediately because we want to send
some final message and we reply to inbound messages with `Quarantined`
* and improve logging
This improves the remote watching mechanism as follows: Watch requests
are intercepted by the RemoteWatcher and not sent on the wire,
excepted watches from the remoteWatcher itself.
RemoteWatcher is then in charge of forwarding DeathWatchNotification
messages to the watchers.
This reduces the number of watch message to one per watchee, even if
there are several watcher on the same watchee (instead of n+1 before).
Reversed watch messages, and watch on ref with undefinedUid are excluded from
interception by the RemoteWatcher and so are handled as before this commit.
In addition, the following changes are made:
- Keep watchers in a map watchee -> watchers for more efficient retrieval
(in a scala Multimap)
- Keep watchees in a map address -> watchee for more efficient retrieval
(in a scala Multimap)
- Use of InternalActorRef more thoroughly to avoid casts
- Rewatch use a standard watch message, as the distinction is longer needed
* The problem was a race caused by HeartbeatReq sent out, and
the watchee terminated immediately. That caused the RemoteWatcher
peers watching each other without any other watch registered.
It is racy.
* Instead of one-way heartbeats from the side beeing watched I
changed to ping-pong style. That makes the problem go away
and simplifies a lot of things in RemoteWatcher.
* For graceful leaving and remove it should still be possible to
communicate with the node after cluster removal.
* Otherwise the hand over in cluster singleton would break, for
example.
* Also, skip selfAddress to avoid generation of AddressTerminated
for the own node when removed from cluster.
* RemoteWatcher that monitors node failures, with heartbeats
and failure detector
* Move RemoteDeploymentWatcher from CARP to RARP
* ClusterRemoteWatcher that handles cluster nodes
* Update documentation
* UID in Heartbeat msg to be able to quarantine,
actual implementation of quarantining will be implemented
in ticket 2594