* Adjust cross DC gossip probability for small nr of nodes in a DC
When a Dc is being bootstrapped the initial node has no local peers and
can not gossip if it selects a local gossip round. Start at a
probability of 1.0 for a single node cluster and move down 0.25 per node
until a 5 node DC is reached then use the cross-data-center-gossip-probability
* Fix cross DC gossip selecting of oldest members
This used to select the members based on the sort order members in
Gossip (by address) rather than by upNumber
* MemberRemoved must be published before MemberUp, e.g. when restarted
in other DC
* remove from failureDetector when receiving gossip with new member,
not only new joining member
* increase timeout in MultiDcSingletonManagerSpec
* the crossDcFailureDetector was not connected to the reachability table
* additional test by listen for {Reachable/Unreachable}DataCenter events in split spec
* missing Java API for getUnreachableDataCenters in CurrentClusterState
* move methods that depends on selfUniqueAddress and selfDc
to a separate MembershipState class, which also holds the
latest gossip
* this removes the need to pass in the parameters from everywhere and
makes it easier to cache some results
* makes it clear that those parameters are always selfUniqueAddress
and selfDc, instead of some arbitary node/dc
* Guarantee no sneaky type puts more teams in the role list
* Leader per team and initial tests
* MiMa filters
* Second iteration (not working though)
* Verbose gossip logging etc.
* Gossip to team-nodes even if there is inter-team unreachability
* More work ...
* Marking removed nodes with tombstones in Gossip
* More test coverage for Gossip.remove
* Bug failing other multi-node tests squashed
* Multi-node test for team-split
* Review fixes - only prune tombstones on leader ticks
* Clean code is happy code.
* All I want is for MiMa to be my friend
* These constants are internal
* Making the formatting gods happy
* I used the wrong reachability for ignoring gossip :/
* Still hadn't quite gotten how reachability was supposed to work
* Review feedback applied
* Cross-team downing should still work
* Actually prune tombstones in the prune tombstones method ...
* Another round against reachability. Reachability leading with 15 - 2 so far.
* properly shutdown ArteryTransport using CoordinatedShutdown, #22671
* The shutdownHook changed hasBeenShutdown flag to true, and then when
the transport.shutdown was invoked the shutdown sequence was ignored
until it was too late, ActorSystem already terminated.
* Also improved the cluster shutdown tasks when the cluster node had not
joined
* CoordinatedShutdownLeave explicit events
* CoordinatedShutdown that can run tasks for configured phases in order (DAG)
* coordinate handover/shutdown of singleton with cluster exiting/shutdown
* phase config obj with depends-on list
* integrate graceful leaving of sharding in coordinated shutdown
* add timeout and recover
* add some missing artery ports to tests
* leave via CoordinatedShutdown.run
* optionally exit-jvm in last phase
* run via jvm shutdown hook
* send ExitingConfirmed to leader before shutdown of Exiting
to not have to wait for failure detector to mark it as
unreachable before removing
* the unreachable signal is still kept as a safe guard if
message is lost or leader dies
* PhaseClusterExiting vs MemberExited in ClusterSingletonManager
* terminate ActorSystem when cluster shutdown (via Down)
* add more predefined and custom phases
* reference documentation
* migration guide
* problem when the leader order was sys2, sys1, sys3,
then sys3 could not perform it's duties and move Leving sys1 to
Exiting because it was observing sys1 as unreachable
* exclude Leaving with exitingConfirmed from convergence condidtion
As documented in the code:
// Leader is moving itself from Leaving to Exiting. Let others know (best effort)
// before shutdown. Otherwise they will not see the Exiting state change
// and there will not be convergence until they have detected this node as
// unreachable and the required downing has finished. They will still need to detect
// unreachable, but Exiting unreachable will be removed without downing, i.e.
// normally the leaving of a leader will be graceful without the need
// for downing. However, if those final gossip messages never arrive it is
// alright to require the downing, because that is probably caused by a
// network failure anyway.
That is fine, but this change improves the selection of the nodes to
send the final gossip messages to.
I could reproduce the failure in ClusterSingletonManagerLeaveSpec and with
additional logging I verified that in the failure case it picked the "first"
node 3 times (it's random) and that node had already been shutdown (left earlier
in the test) but was not removed yet.
* track nodes by UniqueAddress in Cluster Singleton, #20942
* reply with HandOverDone from new incarnation, #20942
* confirm as terminated immediately when new incarnation joins, #20942 instead of waiting for failure detector to mark it as unreachable this will speed-up removal when restarting cluster node with same hostname:port
* Automatic downing of old node incarnation when new tries to rejoin the cluster is performed even if old incarnation was left in Leaving or Exiting state.
* Added information to clustering docs about automatic downing of old incarnations when new tries to rejoin the cluster.
* the reported issue is fixed by the immediate leaderActions
(moving to Up) when joining the first node to itself
* the other changes are precautions just in case
When using a dispatcher (default or separate cluster dispatcher)
with less than 5 threads the Cluster extension initialization
could deadlock.
It was reproducable by adding a sleep before the Await of GetClusterCoreRef
in the Cluster extension constructor. The reason was that other cluster actors were
started too early and they also tried to get the Cluster extension and thereby blocking
dispatcher threads.
Note that the Cluster extension is started via ClusterActorRefProvider before
ActorSystem.apply returns.
The improvement is to start the cluster child actors lazily when the
GetClusterCoreRef is received.
* avoid using Down and Exiting member from being used for joining
* delay shut down of Down member until the information is spread
to all reachable members, e.g. downing several nodes via one node
* akka.cluster.down-removal-margin setting
Margin until shards or singletons that belonged to a
downed/removed partition are created in surviving partition.
Used by singleton and sharding.
* remove the retry count parameters/settings for singleton in
favor of deriving those from the removal-margin
cluster.shutdown
* must also be done when the listener actor stops before the
MemberRemoved event has been received
* add test for this
* clarify docs with example that shuts down actor system and
exit jvm