* When new uid is seen in join attempt we can down existing
member and thereby new restarted node will be able to join
in later retried join attempt without relying on auto-down.
* Skip observations from downed node (quarantined is marked down immediately)
in convergence check
* Skip observations from downed node when picking "reachable" targets for gossip.
* This also means that we must accept gossip with own node marked as unreachable,
but that should not be spread to the external membership events.
* The problem was that the sys msg buffer was filled up during
the deploy phase and triggered quarantine too early and therefore
the "hello" reply was lost. The "hello" ping-pong was not good
enough for deploying one-by-one.
(cherry picked from commit f729afe1fa5401e562655e5a0aaab3f9789e4df6)
Conflicts:
akka-cluster/src/multi-jvm/scala/akka/cluster/SurviveNetworkInstabilitySpec.scala
* Otherwise the leader might stall (cannot remove downed nodes)
if many nodes are shutdown at the same time and nobody in the
remaining cluster is monitoring some of the shutdown nodes.
(cherry picked from commit 1354524c4fde6f40499833bdd4c0edd479e6f906)
Conflicts:
akka-cluster/src/main/scala/akka/cluster/ClusterHeartbeat.scala
project/AkkaBuild.scala
This is an API breaking change if someone implemented their own Routers.
The change is required because the router must know if the local routees
should be started or not so it has to check the roles of the cluster
member (the local one). We could delay this decision of starting local
routees, but that would allow messages to be dead-letter-ed (bad).
* deprecates awaitTermination, shutdown and isTerminated
* introduces a terminate-method that returns a Future[Unit]
* introduces a whenTerminated-method that returns a Future[Unit]
* simplifies the implementation by removing blocking constructs
* adds tests for terminate() and whenTerminated
* Replace stash with internal bufferi, j.u.LinkedList
* Replace FSM with become
* Adaptive backoff, important to backoff, but not for too long,
depends on environment and use case
* Prioritize heartbeat messages from remote watcher and cluster
failure detector
* Use payload messages as heartbeats for transport failure detector,
change transport failure detector to be based on absolute timeout,
see ticket #13989 and #13742
* Log remote disassociate from transport failure detector,
see ticket #13985
* Add benchmark sample in akka-sample-remote-scala
* The problem was that the unreachability observed by second node
was leaking from previous test step and when adding the blackhole,
it could not heal and that caused the leader to not be able to remove
the downed second node because some other nodes were still marked as
unreachable.
* The first node was not included in the the awaitAllReachable check
in the previous step, and the order of awaitAllReachable and
awaitMembersUp was wrong.
* Included the awaitAllReachable check in assertCanTalk.
* Changed to two-way blackhole and using barrier instead of scheduled
event to trigger the exceptions when the blackhole was in place
* We should investigate if unreachable observations from downed node
can be excluded in the convergence check. Created separate ticket for
that 3875.
* It did not use the toString (including full address of destination) of the
node entries, instead it used the hashCode which always included the self
address
* This was a regression in 2.3, it is correct in 2.2.3
* The Identify message didn't get through to the master, which
was stopping at the same time, and it didn't got redirected to
deadletters, i.e. the "termination race"
* because it is not referentially transparent; normally we reserved parens for
side-effecting code but given how people thoughtlessly close over it we revised
that that decision for sender
* caller can still omit parens
- removed retry-window and related settings
- removed gate-invalid-addresses-for
- gate is now mandatory
- remoting has a dedicated dispatcher by default
- updated tests to work with changed timings
- added doc section for association lifecycle
* Getter for CurrentClusterState in Cluster extension, updated via
ClusterReadView
* Remove lazy init of readView. Otherwise the cluster.state will be
empty on first access, wich is probably surprising
* Subscribe to several cluster event types at once, to ensure *one*
CurrentClusterEvent followed by change events
* Deprecate publishCurrentClusterState, was a bad idea, use sendCurrentClusterState
instead
* Possibility to subscribe with InitialStateAsEvents to receive events corresponding
to CurrentClusterState
* CurrentClusterState not a ClusterDomainEvent, ticket #3614