* Replace unreachable Set with Reachability table
* Unreachable members stay in member Set
* Downing a live member was moved it to the unreachable Set,
and then removed from there by the leader. That will not
work when flipping back to reachable, so a Down member must
be detected as unreachable before beeing removed. Similar
to Exiting. Member shuts down itself if it sees itself as
Down.
* Flip back to reachable when failure detector monitors it as
available again
* ReachableMember event
* Can't ignore gossip from aggregated unreachable (see SurviveNetworkInstabilitySpec)
* Make use of ReachableMember event in cluster router
* End heartbeat when acknowledged, EndHeartbeatAck
* Remove nr-of-end-heartbeats from conf
* Full reachability info in JMX cluster status
* Don't use interval after unreachable for AccrualFailureDetector history
* Add QuarantinedEvent to remoting, used for Reachability.Terminated
* Prune reachability table when all reachable
* Update documentation
* Performance testing and optimizations
* The problem was that we didn't wait for the testconductor.shutdown Future
to complete and therefore barriers could be triggered in unexpected order.
The reason why we didn't await, was that during shutdown the Future was
completed with client disconnected failure. I have fixed that and added
await to all shutdowns.
* Major refactoring to remove the need to use special
Cluster instance for testing. Use default Cluster
extension instead. Most of it is trivial changes.
* Used failure-detector.implementation-class from config
to swap to Puppet
* Removed FailureDetectorStrategy, since it doesn't add any value
* Added Cluster.joinSeedNodes to be able to test seedNodes when Addresses
are unknown before startup time.
* Removed ClusterEnvironment that was passed around among the actors,
instead they use the ordinary Cluster extension.
* Overall much cleaner design
* Gossip is not exposed in user api
* Better and more events
* Snapshot event sent to new subscriber
* Updated tests
* Periodic publish only for internal stats
The problem:
* Node that is Up joins a cluster and becomes Joining in that cluster
* The joining node receives gossip, which results in conflict,
merge results in Up
* It became Up in the new cluster without passing the ordinary leader
action to move it to Up
The solution:
* Change priority order of Up and Joining so that Joining is used when
merging
- Created FailureDetectorStrategy base trait.
- Created FailureDetectorPuppetStrategy.
- Created AccrualFailureDetectorStrategy.
- Created two versions of LeaderDowningNodeThatIsUnreachableMultiJvmSpec
- LeaderDowningNodeThatIsUnreachableWithFailureDetectorPuppet
- LeaderDowningNodeThatIsUnreachableWithAccrualFailureDetector
- Added AccrualFailureDetectorStrategy to all the remaining tests - will be split up into two versions shortly.
Signed-off-by: Jonas Bonér <jonas@jonasboner.com>