* Disallow join requests when already part of a cluster
* Remove wipe state when joining, since join can only be
performed from empty state
* When trying to join, only accept gossip from that member
* Ignore gossips from unknown (and unreachable) members
* Make sure received gossip contains selfAddress
* Test join of fresh node with same host:port
* Remove JoinTwoClustersSpec
* Welcome message as reply to Join
* Retry unsucessful join request
* AddressUidExtension
* Uid in cluster Member identifier
To be able to distinguish nodes with same host:port
after restart.
* Ignore gossip with wrong uid
* Renamed Remove command to Shutdown
* Use uid in vclock identifier
* Update sample, Member apply is private
* Disabled config duration syntax and cleanup of io settings
* Update documentation
* Config of node roles cluster.role
* Cluster router configurable with use-role
* RoleLeaderChanged event
* Cluster singleton per role
* Cluster only starts once all required per-role node
counts are reached,
role.<role-name>.min-nr-of-members config
* Update documentation and make use of the roles in the examples
* Failure detector was previously copied with refactoring to
akka-remote and this refactoring makes use of that and removes
the failure detector in akka-cluster
* Adjustments to reference.conf
* Refactoring of FailureDetectorPuppet
* Previous work-around was introduced because Netty blocks when sending
to broken connections. This is supposed to be solved by the non-blocking
new remoting.
* Removed HeartbeatSender and CoreSender in cluster
* Added tests to verify that broken connections don't disturb live connection
* Subscribe to InstantMemberEvent and start heartbeating when
InstantMemberUp. Same for metrics.
* HeartbeatNodeRing data structure for bidirectional mapping of
heartbeat sender and receiver. Not using ConsistentHash anymore.
Node addresses are hashed to ensure that neighbors are spread out.
* HeartbeatRequest when receiver detects that it has not received
expected heartbeats.
* New test InitialHeartbeatSpec that simulates the problem
* Add/remove some related conf properties
* Add some more logging to be able to diagnose eventual problems
* Explicit config of nr-of-end-heartbeats
* case object and case class for MixMetricsSelector
* Rename decay-half-life-duration to moving-average-half-life
* Clarification of decay-half-life-duration and collect-interval
* Removed Fields, Java compatibility issue
* Adapt for-yield variables
* Comment metrics collector constructor that takes system param
* Don't copy EWMA if not needed
* LogOf2 constant 0.69315
* Don't use mapValues
* Remove RichInt conversion
* sigar version replace tag in docs
* createDeployer factory method to make it possible to override
deployer in subclass
* Improve readability of MetricsListener (in sample)
* Better startup of factorial sample (no sleep)
* Many minor enhancements and cleanups
* Refactoring of standard metrics extractors and data structures
* Removed optional value in Metric, simplified a lot
* Configuration of EWMA by using half-life duration
* Renamed DataStream to EWMA
* Incorporate review feedback
* Use binarySearch for selecting weighted routees
* More metrics selectors for the router
* Removed network metrics, since not supported on linux
* Configuration of router
* Rename to AdaptiveLoadBalancingRouter
* Remove total cores metrics, since it's the same as jmx getAvailableProcessors,
tested on intel 24 core server and amd 48 core server, and MBP
* API cleanup
* Java API additions
* Documentation of metrics and AdaptiveLoadBalancingRouter
* New cluster sample to illustrate metrics in the documentation,
and play around with (factorial)
* MetricsSelector, calculate capacity, weights and allocate weighted
routee refs
* ClusterLoadBalancingRouterSpec
* Optional heap max
* Constants for the metric fields
* Refactoring of Metric and decay
* Rewrite of DataStreamSpec
* Correction of EWMA and removal of BigInt, BigDecimal
* Separation of MetricsCollector into trait and two classes,
SigarMetricsCollector and JmxMetricsCollector
* This will reduce cost when sigar is not installed, such as
avoiding throwing and catching exc for every call
* Improved error handling for loading sigar
* Made MetricsCollector implementation configurable
* Tested with sigar
* Previously heartbeat messages was sent to all other members, i.e.
each member was monitored by all other members in the cluster.
* This was the number one know scalability bottleneck, due to the
number of interconnections.
* Limit sending of heartbeats to a few (5) members. Select and
re-balance with consistent hashing algorithm when new members
are added or removed.
* Send a few EndHeartbeat when ending send of Heartbeat messages.
* Gossip is not exposed in user api
* Better and more events
* Snapshot event sent to new subscriber
* Updated tests
* Periodic publish only for internal stats
* Trying to simultaneously resolving conflicts at several nodes creates new conflicts.
Therefore the leader resolves conflicts to limit divergence. To avoid overload there
is also a configurable rate limit of how many conflicts that are handled by second.
* Netty blocks when sending to broken connections. ClusterHeartbeatSender actor
isolates sending to different nodes by using child workers for each target
address and thereby reduce the risk of irregular heartbeats to healty
nodes due to broken connections to other nodes.
* Implement the join to seed nodes process
When a new node is started started it sends a message to all
seed nodes and then sends join command to the one that answers
first.
* Configuration of seed-nodes and auto-join
* New JoinSeedNodeSpec that verifies the auto join to seed nodes
* In tests seed nodes are configured by overriding seedNodes
function, since addresses are not known before start
* Deputy nodes are the live members of the seed nodes (not sure if
that will be the final solution, see ticket 2252
* Updated cluster.rst with latest info about deputy and seed nodes
* Implementation of phi according to the paper
* Config properties and documentation, min-std-deviation,
* acceptable-lost-heartbeats
* Restructure code, HeartbeatHistory is responsible for
stats from historical heartbeats
* Correct and efficient calculation of mean and standard
deviation
* More tests
- Abstracted a FailureDetector trait.
- Added a FailureDetectorPuppet mock that can be user controllable
- Added option to define a custom failure detector
- Misc minor fixes
Signed-off-by: Jonas Bonér <jonas@jonasboner.com>
- Renamed all 'frequency' to 'interval'
- Split up NodeJoinAndUpSpec and into NodeJoinSpec and NodeUpSpec.
- Split up MembershipChangeListenerJoinAndUpSpec and into MembershipChangeListenerJoinSpec and MembershipChangeListenerUpSpec.
- Added utility method 'startClusterNode()'
- Fixed race in register listener and telling node to leave
- Removed 'after' blocks
- Cleaned up unused code
- Improved comments
Signed-off-by: Jonas Bonér <jonas@jonasboner.com>
* Added possibility for user to 'down' a node
* Added possibility for the leader to 'auto-down' a node.
* Added leader role actions
- Moving nodes from JOINING -> UP
- Moving nodes from EXITING -> REMOVED
- AUTO-DOWNING
* Added tests for user and leader downing
* Added 'auto-down' option to turn auto-downing on and off
* Fixed bug in semantic Member Ordering
* Removed FSM stuff from ClusterCommandDaemon (including the test) since the node status should only be in the converged gossip state
Signed-off-by: Jonas Bonér <jonas@jonasboner.com>