* Trying to simultaneously resolving conflicts at several nodes creates new conflicts.
Therefore the leader resolves conflicts to limit divergence. To avoid overload there
is also a configurable rate limit of how many conflicts that are handled by second.
* Netty blocks when sending to broken connections. ClusterHeartbeatSender actor
isolates sending to different nodes by using child workers for each target
address and thereby reduce the risk of irregular heartbeats to healty
nodes due to broken connections to other nodes.
* Essentially as already described in cluster specification,
but now fully implemented and tested with LargeClusterSpec
* Gossip to nodes with different view (using seen table)
with certain probability
* Gossip chat, gossip back to sender
* Immediate gossip to joining node
* Updated some tests to reflect current implementation
* Add new operators :+ and :++ by implicit conversion
* Unfortunately this means that we must remember to use
these until SI-5986 is fixed. Is there a better way?
* self is initially not member (in gossip state)
* if the join to seed nodes timeout it joins itself, and becomes
singleton cluster
* remove the special case handling of singelton cluster in gossip
merge, since singleton cluster is not the normal state when joining
any more
* Implement the join to seed nodes process
When a new node is started started it sends a message to all
seed nodes and then sends join command to the one that answers
first.
* Configuration of seed-nodes and auto-join
* New JoinSeedNodeSpec that verifies the auto join to seed nodes
* In tests seed nodes are configured by overriding seedNodes
function, since addresses are not known before start
* Deputy nodes are the live members of the seed nodes (not sure if
that will be the final solution, see ticket 2252
* Updated cluster.rst with latest info about deputy and seed nodes
To fix LEADER leaving and allow handoff to new leader before moving old leader from EXITING -> REMOVED.
Signed-off-by: Jonas Bonér <jonas@jonasboner.com>
- Removed REMOVED as explicit valid member state
- Implemented leader moving either itself or other member from EXITING -> REMOVED
- Added sending Remove message for removed node to shut down itself
- Fixed a few bugs
- Removed 'remove' from Cluster and JMX interface
- Added bunch of ScalaDoc
- Added isRunning method
Signed-off-by: Jonas Bonér <jonas@jonasboner.com>
The problem:
* Node that is Up joins a cluster and becomes Joining in that cluster
* The joining node receives gossip, which results in conflict,
merge results in Up
* It became Up in the new cluster without passing the ordinary leader
action to move it to Up
The solution:
* Change priority order of Up and Joining so that Joining is used when
merging
* Merge unreachable using highestPriorityOf
* Avoid merge result in node existing in both members and unreachable
* Fix joining only allowed when !alreadyMember && !isUnreachable (non Down)
* Fix filter bug of unreachable in downing and leaderActions
* Minor cleanups
* All members must be in seen table for convergence
* Added extra debug logging due to convergence issues
* Enabled test of convergence for node joining singleton
cluster
- Moved FailureDetectorPuppet to its own file in src/test.
- Removed 'phi' method from FailureDetector public API.
- Throwing exception instead of falling back to default if we can't load the custom FD.
- Removed add-connection method in FailureDetectorPuppet.
Signed-off-by: Jonas Bonér <jonas@jonasboner.com>