* Replace (deprecate) akka.cluster.auto-down config setting with
akka.cluster.auto-down-unreachable-after
* AutoDown actor that keeps track of unreachable members
and performs down from the leader node when they have been
unreachable for the specified duration
* Migration guide
* Replace unreachable Set with Reachability table
* Unreachable members stay in member Set
* Downing a live member was moved it to the unreachable Set,
and then removed from there by the leader. That will not
work when flipping back to reachable, so a Down member must
be detected as unreachable before beeing removed. Similar
to Exiting. Member shuts down itself if it sees itself as
Down.
* Flip back to reachable when failure detector monitors it as
available again
* ReachableMember event
* Can't ignore gossip from aggregated unreachable (see SurviveNetworkInstabilitySpec)
* Make use of ReachableMember event in cluster router
* End heartbeat when acknowledged, EndHeartbeatAck
* Remove nr-of-end-heartbeats from conf
* Full reachability info in JMX cluster status
* Don't use interval after unreachable for AccrualFailureDetector history
* Add QuarantinedEvent to remoting, used for Reachability.Terminated
* Prune reachability table when all reachable
* Update documentation
* Performance testing and optimizations
* Subscribe to InstantMemberEvent and start heartbeating when
InstantMemberUp. Same for metrics.
* HeartbeatNodeRing data structure for bidirectional mapping of
heartbeat sender and receiver. Not using ConsistentHash anymore.
Node addresses are hashed to ensure that neighbors are spread out.
* HeartbeatRequest when receiver detects that it has not received
expected heartbeats.
* New test InitialHeartbeatSpec that simulates the problem
* Add/remove some related conf properties
* Add some more logging to be able to diagnose eventual problems
* Explicit config of nr-of-end-heartbeats
* akka.cluster.StressSpec
* Configurable number of nodes and duration for each step
* Report metrics and phi periodically to see progress
* Configurable payload size
* Test of various join and remove scenarios
* Test of watch
* Exercise supervision
* Report cluster stats
* Test with many actors in tree structure
Apart from the test this commit also solves some issues:
* Avoid adding back members when downed in ClusterHeartbeatSender
* Avoid duplicate close of ClusterReadView
* Add back the publish of AddressTerminated when MemberDowned/Removed
it was lost in merge of "publish on convergence", see #2779
These tests use the throttling in the experimental test conductor which relies
on the fact that the same connection is used for both inbound and outbound
traffic. This is not always the case when starting multiple cluster nodes
at the same time.
* Due to the shutdown issues the TestConductorTransport is by
default not active, but it's easy to activate it and exception
will be thrown if trying to use the featues that require it, i.e
blackhole, passThrow and throttle
* Documented
* Major refactoring to remove the need to use special
Cluster instance for testing. Use default Cluster
extension instead. Most of it is trivial changes.
* Used failure-detector.implementation-class from config
to swap to Puppet
* Removed FailureDetectorStrategy, since it doesn't add any value
* Added Cluster.joinSeedNodes to be able to test seedNodes when Addresses
are unknown before startup time.
* Removed ClusterEnvironment that was passed around among the actors,
instead they use the ordinary Cluster extension.
* Overall much cleaner design
* Gossip is not exposed in user api
* Better and more events
* Snapshot event sent to new subscriber
* Updated tests
* Periodic publish only for internal stats