* CoordinatedShutdown that can run tasks for configured phases in order (DAG)
* coordinate handover/shutdown of singleton with cluster exiting/shutdown
* phase config obj with depends-on list
* integrate graceful leaving of sharding in coordinated shutdown
* add timeout and recover
* add some missing artery ports to tests
* leave via CoordinatedShutdown.run
* optionally exit-jvm in last phase
* run via jvm shutdown hook
* send ExitingConfirmed to leader before shutdown of Exiting
to not have to wait for failure detector to mark it as
unreachable before removing
* the unreachable signal is still kept as a safe guard if
message is lost or leader dies
* PhaseClusterExiting vs MemberExited in ClusterSingletonManager
* terminate ActorSystem when cluster shutdown (via Down)
* add more predefined and custom phases
* reference documentation
* migration guide
* problem when the leader order was sys2, sys1, sys3,
then sys3 could not perform it's duties and move Leving sys1 to
Exiting because it was observing sys1 as unreachable
* exclude Leaving with exitingConfirmed from convergence condidtion
* send terminationMessage to singleton when leaving last, #21592
* When leaving last node, i.e. no newOldestOption, the manager was
just stopped. The change is to send the terminationMessage also
in this case and wait until the singleton actor is terminated
before stopping the manager.
* Also changed so that the singleton is stopped immediately when
cluster has been terminated when last node is leaving, i.e.
no newOldestOption. Previously it retried until maxTakeOverRetries
before stopping.
* More comprehensive test of this scenario in ClusterSingletonManagerLeaveSpec
* increase test timeout
* When the test fails the node is removed from the membership
twice, which triggers two OldestChanged cycles, but in
the 2.4.9 change https://github.com/akka/akka/pull/21152/files#diff-f0ae95c926a050aecf45dba3e08d1c77L669
the singleton manager always goes to End (stop) when it has been Oldest
* This fix restores the previous behavior for this scenario
* track nodes by UniqueAddress in Cluster Singleton, #20942
* reply with HandOverDone from new incarnation, #20942
* confirm as terminated immediately when new incarnation joins, #20942 instead of waiting for failure detector to mark it as unreachable this will speed-up removal when restarting cluster node with same hostname:port
* the reported issue is fixed by the immediate leaderActions
(moving to Up) when joining the first node to itself
* the other changes are precautions just in case
* In 2.4 we derive the number of hand-over/take-over retries from
the removal margin, but we decided to set that to 0 by default, since
it is intended for network partition scenarios. maxTakeOverRetries
became 1. So there must be also be a min number of retries property.
* The test failed for the leaving scenario because the singleton
instance was stopped hard without sending the terminationMessage when
the maxTakeOverRetries was exceeded.
For manual downing it is not needed. For auto-down it doesn't add any extra safety, since that
is not handling network partitions anyway.
The setting is still useful if you implement downing strategies that handle network partitions,
e.g. by keeping the larger side of the partition and shutting down the smaller side.
* avoid using Down and Exiting member from being used for joining
* delay shut down of Down member until the information is spread
to all reachable members, e.g. downing several nodes via one node
* akka.cluster.down-removal-margin setting
Margin until shards or singletons that belonged to a
downed/removed partition are created in surviving partition.
Used by singleton and sharding.
* remove the retry count parameters/settings for singleton in
favor of deriving those from the removal-margin