* Disallow join requests when already part of a cluster
* Remove wipe state when joining, since join can only be
performed from empty state
* When trying to join, only accept gossip from that member
* Ignore gossips from unknown (and unreachable) members
* Make sure received gossip contains selfAddress
* Test join of fresh node with same host:port
* Remove JoinTwoClustersSpec
* Welcome message as reply to Join
* Retry unsucessful join request
* AddressUidExtension
* Uid in cluster Member identifier
To be able to distinguish nodes with same host:port
after restart.
* Ignore gossip with wrong uid
* Renamed Remove command to Shutdown
* Use uid in vclock identifier
* Update sample, Member apply is private
* Disabled config duration syntax and cleanup of io settings
* Update documentation
* Deprecate all actorFor methods
* resolveActorRef in provider
* Identify auto receive message
* Support ActorPath in actorSelection
* Support remote actor selections
* Additional tests of actor selection
* Update tests (keep most actorFor tests)
* Update samples to use actorSelection
* Updates to documentation
* Migration guide, including motivation
* Config of node roles cluster.role
* Cluster router configurable with use-role
* RoleLeaderChanged event
* Cluster singleton per role
* Cluster only starts once all required per-role node
counts are reached,
role.<role-name>.min-nr-of-members config
* Update documentation and make use of the roles in the examples
* The scenario was that previous leader left.
* The problem was that the new leader got MemberRemoved
before it got the HandOverDone and therefore missed the
hand over data.
* Solved by not changing the singleton to leader when receiving
MemberRemoved and instead do that on normal HandOverDone or
in failure cases after retry timeout.
* The reason for this bug was the new transition from Down to
Removed and that there is now no MemberDowned event. Previously
this was only triggered by MemberDowned (not MemberRemoved) and
that was safe because that was "always" preceeded by unreachable.
* The new solution means that it will take longer for new singleton
to startup in case of unreachable previous leader, but I don't
want to trigger it on MemberUnreachable because it might in the
future be possible to switch it back to reachable.
* Problem may occur when joining member with same hostname:port again,
after downing.
* Reproduced with StressSpec exerciseJoinRemove with fixed port that
joins and shutdown several times.
* Real solution for this will be covered by ticket #2788 by adding
uid to member identifier, but as first step we need to support
this scenario with current design.
* Use unique node identifier for vector clock to avoid mixup of
old and new member instance.
* Support transition from Down to Joining in Gossip merge
* Don't gossip to unknown or unreachable members.
* ClusterCoreDaemon and ClusterDomainEventPublisher can't be restarted
because the state would be obsolete.
* Add extra supervisor level for ClusterCoreDaemon and
ClusterDomainEventPublisher, which will shutdown the member
on failure in children.
* Publish the final removed state on postStop in
ClusterDomainEventPublisher. This also simplifies the removing
process.
* Previous work-around was introduced because Netty blocks when sending
to broken connections. This is supposed to be solved by the non-blocking
new remoting.
* Removed HeartbeatSender and CoreSender in cluster
* Added tests to verify that broken connections don't disturb live connection
* When a def starts with if and is not a oneliner the if
should be on a new line.
* The reason is that it might be easy to miss the if when
reading the code.
* Subscribe to InstantMemberEvent and start heartbeating when
InstantMemberUp. Same for metrics.
* HeartbeatNodeRing data structure for bidirectional mapping of
heartbeat sender and receiver. Not using ConsistentHash anymore.
Node addresses are hashed to ensure that neighbors are spread out.
* HeartbeatRequest when receiver detects that it has not received
expected heartbeats.
* New test InitialHeartbeatSpec that simulates the problem
* Add/remove some related conf properties
* Add some more logging to be able to diagnose eventual problems
* Explicit config of nr-of-end-heartbeats
* The failure in JoinTwoClustersSpec was due to missing publishing
of cluster events when clearing current state when joining
* This fix is in the right direction, but joining clusters like this
will need some design thought, creating ticket 2873 for that
* Renamed isRunning to isTerminated (with negation of course)
* Removed Running from JMX API, since the mbean is deregistered anyway
* Cleanup isAvailable, isUnavailbe
* Misc minor
* Previously heartbeat messages was sent to all other members, i.e.
each member was monitored by all other members in the cluster.
* This was the number one know scalability bottleneck, due to the
number of interconnections.
* Limit sending of heartbeats to a few (5) members. Select and
re-balance with consistent hashing algorithm when new members
are added or removed.
* Send a few EndHeartbeat when ending send of Heartbeat messages.
* To avoid ordering surprises metrics should be published via
the same actor that handles the subscriptions and publishes
other cluster domain events.
* Added missing publish in case of removal of member
(had a test failure for that)
* Added publishCurrentClusterState and sendCurrentClusterState
* Removed Ping/Pong that was used for some tests, since awaitCond is
now needed anyway, since publish to eventStream is done afterwards
* Major refactoring to remove the need to use special
Cluster instance for testing. Use default Cluster
extension instead. Most of it is trivial changes.
* Used failure-detector.implementation-class from config
to swap to Puppet
* Removed FailureDetectorStrategy, since it doesn't add any value
* Added Cluster.joinSeedNodes to be able to test seedNodes when Addresses
are unknown before startup time.
* Removed ClusterEnvironment that was passed around among the actors,
instead they use the ordinary Cluster extension.
* Overall much cleaner design
* Defined the domain events in ClusterEvent.scala file
* Produce events from diff and publish publish to event bus
from separate actor, ClusterDomainEventPublisher
* Adjustments of tests