* looks like the ActorSystem is shutdown when leaving
* Included in MultiNodeSpec, i.e. all multi-node tests:
akka.coordinated-shutdown.terminate-actor-system = off
akka.oordinated-shutdown.run-by-jvm-shutdown-hook = off
* Setting to configure where the flight recorder puts its file
* Run ArteryMultiNodeSpecs with flight recorder enabled
* More cleanup in exit hook, wait for task runner to stop
* Enable flight recorder for the cluster multi node tests
* Enable flight recorder for multi node remoting tests
* Toggle always-dump flight recorder output when akka.remote.artery.always-dump-flight-recorder is set
* need to use a shared media driver to get the cpu usage
at a reasonable level
* also changed to SleepingIdleStrategy(1 ms) when cpu-level=1
not needed for the test to pass, but can be good to make level 1
more extreme
* Provide shorter aliases for the ActorRefProviders #20649
* Use the new actorefprovider aliases throughout code and docs
* Cleaner alias replacement logic
* because it is not referentially transparent; normally we reserved parens for
side-effecting code but given how people thoughtlessly close over it we revised
that that decision for sender
* caller can still omit parens
* The previous one-way hearbeat was elegant, but comlicated to
understand and without giving much extra value compared to this approach.
* The previous one-way heartbeat have some kind of bug when joining
several (10-20) nodes at approximately the same time (but not exactly
the same time) with a false failure detection triggered by the extra heartbeat,
which would not heal.
* This ping-pong approach will increase network traffic slightly, but heartbeat
messages are small and each node is limited to monitor (default) 5 peers.
* Replace (deprecate) akka.cluster.auto-down config setting with
akka.cluster.auto-down-unreachable-after
* AutoDown actor that keeps track of unreachable members
and performs down from the leader node when they have been
unreachable for the specified duration
* Migration guide
* It was a regression introduced in dc9fe4f
* Two problems:
1) Gossip merge could pop back removed member (was previously
covered by the filter of unreachable)
2) Reachability merge didn't handle all cases for removed member,
i.e. when node not in allowed set
* Replace unreachable Set with Reachability table
* Unreachable members stay in member Set
* Downing a live member was moved it to the unreachable Set,
and then removed from there by the leader. That will not
work when flipping back to reachable, so a Down member must
be detected as unreachable before beeing removed. Similar
to Exiting. Member shuts down itself if it sees itself as
Down.
* Flip back to reachable when failure detector monitors it as
available again
* ReachableMember event
* Can't ignore gossip from aggregated unreachable (see SurviveNetworkInstabilitySpec)
* Make use of ReachableMember event in cluster router
* End heartbeat when acknowledged, EndHeartbeatAck
* Remove nr-of-end-heartbeats from conf
* Full reachability info in JMX cluster status
* Don't use interval after unreachable for AccrualFailureDetector history
* Add QuarantinedEvent to remoting, used for Reachability.Terminated
* Prune reachability table when all reachable
* Update documentation
* Performance testing and optimizations
* When seen same the gossip chat is initated with GossipStatus
message containing the vclock only
* Remove conversation flag in GossipEnvelope
* Ordinary tell instead of actorSelection when replying
* UnreachableNodeJoinsAgain failed because of gated connection
* Removed default test value of retry-gate-closed-for, instead
default from reference.conf is used, i.e. 0s
* deadLetters logging love
* Rename config akka.event-handlers to akka.loggers
* Rename config akka.event-handler-startup-timeout to
akka.logger-startup-timeout
* Rename JulEventHandler to JavaLogger
* Rename Slf4jEventHandler to Slf4jLogger
* Change all places in tests and docs
* Deprecation, old still works, but with warnings
* Migration guide
* Test for the deprecated event-handler config
* Failure detector was previously copied with refactoring to
akka-remote and this refactoring makes use of that and removes
the failure detector in akka-cluster
* Adjustments to reference.conf
* Refactoring of FailureDetectorPuppet
* When a def starts with if and is not a oneliner the if
should be on a new line.
* The reason is that it might be easy to miss the if when
reading the code.
* Subscribe to InstantMemberEvent and start heartbeating when
InstantMemberUp. Same for metrics.
* HeartbeatNodeRing data structure for bidirectional mapping of
heartbeat sender and receiver. Not using ConsistentHash anymore.
Node addresses are hashed to ensure that neighbors are spread out.
* HeartbeatRequest when receiver detects that it has not received
expected heartbeats.
* New test InitialHeartbeatSpec that simulates the problem
* Add/remove some related conf properties
* Add some more logging to be able to diagnose eventual problems
* Explicit config of nr-of-end-heartbeats
* akka.cluster.StressSpec
* Configurable number of nodes and duration for each step
* Report metrics and phi periodically to see progress
* Configurable payload size
* Test of various join and remove scenarios
* Test of watch
* Exercise supervision
* Report cluster stats
* Test with many actors in tree structure
Apart from the test this commit also solves some issues:
* Avoid adding back members when downed in ClusterHeartbeatSender
* Avoid duplicate close of ClusterReadView
* Add back the publish of AddressTerminated when MemberDowned/Removed
it was lost in merge of "publish on convergence", see #2779