As documented in the code:
// Leader is moving itself from Leaving to Exiting. Let others know (best effort)
// before shutdown. Otherwise they will not see the Exiting state change
// and there will not be convergence until they have detected this node as
// unreachable and the required downing has finished. They will still need to detect
// unreachable, but Exiting unreachable will be removed without downing, i.e.
// normally the leaving of a leader will be graceful without the need
// for downing. However, if those final gossip messages never arrive it is
// alright to require the downing, because that is probably caused by a
// network failure anyway.
That is fine, but this change improves the selection of the nodes to
send the final gossip messages to.
I could reproduce the failure in ClusterSingletonManagerLeaveSpec and with
additional logging I verified that in the failure case it picked the "first"
node 3 times (it's random) and that node had already been shutdown (left earlier
in the test) but was not removed yet.
* track nodes by UniqueAddress in Cluster Singleton, #20942
* reply with HandOverDone from new incarnation, #20942
* confirm as terminated immediately when new incarnation joins, #20942 instead of waiting for failure detector to mark it as unreachable this will speed-up removal when restarting cluster node with same hostname:port
* Automatic downing of old node incarnation when new tries to rejoin the cluster is performed even if old incarnation was left in Leaving or Exiting state.
* Added information to clustering docs about automatic downing of old incarnations when new tries to rejoin the cluster.
* the reported issue is fixed by the immediate leaderActions
(moving to Up) when joining the first node to itself
* the other changes are precautions just in case
When using a dispatcher (default or separate cluster dispatcher)
with less than 5 threads the Cluster extension initialization
could deadlock.
It was reproducable by adding a sleep before the Await of GetClusterCoreRef
in the Cluster extension constructor. The reason was that other cluster actors were
started too early and they also tried to get the Cluster extension and thereby blocking
dispatcher threads.
Note that the Cluster extension is started via ClusterActorRefProvider before
ActorSystem.apply returns.
The improvement is to start the cluster child actors lazily when the
GetClusterCoreRef is received.
* avoid using Down and Exiting member from being used for joining
* delay shut down of Down member until the information is spread
to all reachable members, e.g. downing several nodes via one node
* akka.cluster.down-removal-margin setting
Margin until shards or singletons that belonged to a
downed/removed partition are created in surviving partition.
Used by singleton and sharding.
* remove the retry count parameters/settings for singleton in
favor of deriving those from the removal-margin
cluster.shutdown
* must also be done when the listener actor stops before the
MemberRemoved event has been received
* add test for this
* clarify docs with example that shuts down actor system and
exit jvm
* The leader is selected by picking the first reachable member, but in
#13875 we had to let the self member be unreachable in the Reachability
table and that was not considered in the logic of the leader selection.
* That means changed behavior that is unwanted, especially when there
is only one node left the leader could be evaluated to None instead
of Some(selfUniqueAddress).
* Note that #13875 has not been released yet.
* When new uid is seen in join attempt we can down existing
member and thereby new restarted node will be able to join
in later retried join attempt without relying on auto-down.
* Skip observations from downed node (quarantined is marked down immediately)
in convergence check
* Skip observations from downed node when picking "reachable" targets for gossip.
* This also means that we must accept gossip with own node marked as unreachable,
but that should not be spread to the external membership events.
* because it is not referentially transparent; normally we reserved parens for
side-effecting code but given how people thoughtlessly close over it we revised
that that decision for sender
* caller can still omit parens
* Getter for CurrentClusterState in Cluster extension, updated via
ClusterReadView
* Remove lazy init of readView. Otherwise the cluster.state will be
empty on first access, wich is probably surprising
* Subscribe to several cluster event types at once, to ensure *one*
CurrentClusterEvent followed by change events
* Deprecate publishCurrentClusterState, was a bad idea, use sendCurrentClusterState
instead
* Possibility to subscribe with InitialStateAsEvents to receive events corresponding
to CurrentClusterState
* CurrentClusterState not a ClusterDomainEvent, ticket #3614
* The previous one-way hearbeat was elegant, but comlicated to
understand and without giving much extra value compared to this approach.
* The previous one-way heartbeat have some kind of bug when joining
several (10-20) nodes at approximately the same time (but not exactly
the same time) with a false failure detection triggered by the extra heartbeat,
which would not heal.
* This ping-pong approach will increase network traffic slightly, but heartbeat
messages are small and each node is limited to monitor (default) 5 peers.