There exists a race where a cluter node that is being downed seens its
self as the oldest node (as it has had the other nodes removed) and it
takes over the singleton manager sending the real oldest node to go into
the End state meaning that cluster singletons never work again.
This fix simply prevents Member events being given to the Cluster
Manager FSM during a shut down, instread relying on SelfExiting.
This also hardens the test by not downing the node that the current
sharding coordinator is running on as well as fixing a bug in the
probes.
* Cluster management (join, leave, etc)
* Cluster membership subscriptions (MemberUp, MemberRemoved, etc)
* New SelfUp and SelfRemoved events
* change signature of awaitAssert to return the value (not binary compatible)
* Cluster singleton api
* Guarantee no sneaky type puts more teams in the role list
* Leader per team and initial tests
* MiMa filters
* Second iteration (not working though)
* Verbose gossip logging etc.
* Gossip to team-nodes even if there is inter-team unreachability
* More work ...
* Marking removed nodes with tombstones in Gossip
* More test coverage for Gossip.remove
* Bug failing other multi-node tests squashed
* Multi-node test for team-split
* Review fixes - only prune tombstones on leader ticks
* Clean code is happy code.
* All I want is for MiMa to be my friend
* These constants are internal
* Making the formatting gods happy
* I used the wrong reachability for ignoring gossip :/
* Still hadn't quite gotten how reachability was supposed to work
* Review feedback applied
* Cross-team downing should still work
* Actually prune tombstones in the prune tombstones method ...
* Another round against reachability. Reachability leading with 15 - 2 so far.
* Getter for CurrentClusterState in Cluster extension, updated via
ClusterReadView
* Remove lazy init of readView. Otherwise the cluster.state will be
empty on first access, wich is probably surprising
* Subscribe to several cluster event types at once, to ensure *one*
CurrentClusterEvent followed by change events
* Deprecate publishCurrentClusterState, was a bad idea, use sendCurrentClusterState
instead
* Possibility to subscribe with InitialStateAsEvents to receive events corresponding
to CurrentClusterState
* CurrentClusterState not a ClusterDomainEvent, ticket #3614
* Replace (deprecate) akka.cluster.auto-down config setting with
akka.cluster.auto-down-unreachable-after
* AutoDown actor that keeps track of unreachable members
and performs down from the leader node when they have been
unreachable for the specified duration
* Migration guide
* Disallow join requests when already part of a cluster
* Remove wipe state when joining, since join can only be
performed from empty state
* When trying to join, only accept gossip from that member
* Ignore gossips from unknown (and unreachable) members
* Make sure received gossip contains selfAddress
* Test join of fresh node with same host:port
* Remove JoinTwoClustersSpec
* Welcome message as reply to Join
* Retry unsucessful join request
* AddressUidExtension
* Uid in cluster Member identifier
To be able to distinguish nodes with same host:port
after restart.
* Ignore gossip with wrong uid
* Renamed Remove command to Shutdown
* Use uid in vclock identifier
* Update sample, Member apply is private
* Disabled config duration syntax and cleanup of io settings
* Update documentation
* Stop ClusterDaemon if init of core actor fails.
* Activate jmx-enabled setting
* Adjust the err msg of InvalidActorNameException to match conventions
* Config of node roles cluster.role
* Cluster router configurable with use-role
* RoleLeaderChanged event
* Cluster singleton per role
* Cluster only starts once all required per-role node
counts are reached,
role.<role-name>.min-nr-of-members config
* Update documentation and make use of the roles in the examples