Commit graph

203 commits

Author SHA1 Message Date
Patrik Nordwall
1ccb9fe7ec Note about URLEncode instead of MD5, see #2290 2012-07-04 11:58:51 +02:00
Patrik Nordwall
e5979bc31c Gossip merge in large cluster, #2290
* Trying to simultaneously resolving conflicts at several nodes creates new conflicts.
  Therefore the leader resolves conflicts to limit divergence. To avoid overload there
  is also a configurable rate limit of how many conflicts that are handled by second.
* Netty blocks when sending to broken connections. ClusterHeartbeatSender actor
  isolates sending to different nodes by using child workers for each target
  address and thereby reduce the risk of irregular heartbeats to healty
  nodes due to broken connections to other nodes.
2012-07-02 23:00:41 +02:00
Patrik Nordwall
c09caebe8a Small refactoring of cluster actors
* Separate actor for heartbeats, so they are more isolated from gossip
  messages
* Configuration property for dispatcher to use for the cluster actors
2012-07-02 23:00:41 +02:00
Björn Antonsson
675dfd9182 Keep the cluster node membership change listeners when joining. 2012-06-29 13:24:46 +02:00
Björn Antonsson
6ad96c2579 Review changes 2012-06-29 13:24:46 +02:00
Björn Antonsson
574ff26bb4 Support for re-JOINING a node that have been DOWN. See #1908 2012-06-29 13:24:46 +02:00
Patrik Nordwall
d47ff04c03 Moved GossipDifferentViewProbability to config, see #2253 2012-06-29 08:56:58 +02:00
Patrik Nordwall
2da1a912fe Improve efficiency of gossip, see #2193 and #2253
* Essentially as already described in cluster specification,
  but now fully implemented and tested with LargeClusterSpec
* Gossip to nodes with different view (using seen table)
  with certain probability
* Gossip chat, gossip back to sender
* Immediate gossip to joining node
* Updated some tests to reflect current implementation
2012-06-28 11:41:48 +02:00
Patrik Nordwall
aca66de732 Test gossip in large cluster, see #2239 2012-06-28 11:41:28 +02:00
Patrik Nordwall
aed78f702b Workaround for SI-5986, see #2275
* Add new operators :+ and :++ by implicit conversion
* Unfortunately this means that we must remember to use
  these until SI-5986 is fixed. Is there a better way?
2012-06-26 18:19:33 +02:00
Patrik Nordwall
25996bf284 Join seed nodes before becoming singleton cluster, see #2267
* self is initially not member (in gossip state)
* if the join to seed nodes timeout it joins itself, and becomes
  singleton cluster
* remove the special case handling of singelton cluster in gossip
  merge, since singleton cluster is not the normal state when joining
  any more
2012-06-25 21:34:14 +02:00
Patrik Nordwall
2cd38e2004 Merge branch 'master' into wip-2263-gossip-unreachable-patriknw
Conflicts:
	akka-cluster/src/main/scala/akka/cluster/Cluster.scala
2012-06-25 20:59:50 +02:00
Patrik Nordwall
97bf8c4bb5 Cleanup of comments, see #2263 2012-06-25 20:46:48 +02:00
Patrik Nordwall
20fc0c42a2 Merge branch 'master' into wip-2219-seed-nodes-patriknw
Conflicts:
	akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala
	akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala
2012-06-25 20:40:06 +02:00
Patrik Nordwall
738565883b Add join-seed-node-timeout config, see #2219 2012-06-25 20:20:11 +02:00
Patrik Nordwall
cba64403a7 Don't gossip to unreachable, see #2263
* Also, ignore gossip from unreachable, see #2264
* Update gossip protocol in cluster doc
2012-06-25 15:23:15 +02:00
patriknw
555998b2c1 Merge pull request #553 from akka/wip-2249-heartbeats-after-join-patriknw
Start sending heartbeats immediately when joining, see #2249
2012-06-25 01:26:18 -07:00
patriknw
24e49b1024 Merge pull request #551 from akka/wip-2250-singleton-cluster-merge-patriknw
Avoid gossip merge when singleton cluster, see #2250
2012-06-25 01:25:14 -07:00
Patrik Nordwall
42078e7083 Reintroduce 'seed' nodes, see #2219
* Implement the join to seed nodes process
  When a new node is started started it sends a message to all
  seed nodes and then sends join command to the one that answers
  first.
* Configuration of seed-nodes and auto-join
* New JoinSeedNodeSpec that verifies the auto join to seed nodes
* In tests seed nodes are configured by overriding seedNodes
  function, since addresses are not known before start
* Deputy nodes are the live members of the seed nodes (not sure if
  that will be the final solution, see ticket 2252
* Updated cluster.rst with latest info about deputy and seed nodes
2012-06-21 11:05:02 +02:00
Viktor Klang
9b73d75c1b Removing the naught default in code of the failure detector and changed so that the AccrualFailureDetectors constructor matches what the instantiator expects 2012-06-20 14:14:10 +02:00
Patrik Nordwall
529c25f3dc Start sending heartbeats immediately when joining, see #2249
* Keep track of joins that are in progress in State.joinInProgress,
  with Deadline
* Add test that fails without this feature
2012-06-20 13:20:28 +02:00
Patrik Nordwall
dccb0ca2d7 Avoid gossip merge when singleton cluster, see #2250 2012-06-20 11:37:13 +02:00
Jonas Bonér
d38aa2ed9c Added ScalaDoc about the Leaving, Exiting and Removed states 2012-06-19 20:11:54 +02:00
Jonas Bonér
9011c310e1 Minor cleanup.
Signed-off-by: Jonas Bonér <jonas@jonasboner.com>
2012-06-19 14:27:12 +02:00
Jonas Bonér
fd54a93135 Added ScalaDoc on 'def status: MemberStatus' describing the MemberStatus.Removed semantics.
Signed-off-by: Jonas Bonér <jonas@jonasboner.com>
2012-06-19 14:21:56 +02:00
Jonas Bonér
49586bd01d Change Member ordering so it sorts members by host and port with the exception that it puts all members that are in MemberStatus.EXITING last.
To fix LEADER leaving and allow handoff to new leader before moving old leader from EXITING -> REMOVED.

Signed-off-by: Jonas Bonér <jonas@jonasboner.com>
2012-06-18 15:25:17 +02:00
Jonas Bonér
8b6652a794 Fixed all issues from review. In particular fully separated state transformation and preparation for side-effecting processing.
Signed-off-by: Jonas Bonér <jonas@jonasboner.com>
2012-06-18 13:53:49 +02:00
Jonas Bonér
6d96d04234 Merge branch 'master' into wip-2162-redesign-of-management-of-the-exiting-to-removed-life-cycle-jboner 2012-06-16 00:18:26 +02:00
Jonas Bonér
469fcd8305 Redesign of life-cycle management of EXITING -> REMOVED. Fixes #2177.
- Removed REMOVED as explicit valid member state
- Implemented leader moving either itself or other member from EXITING -> REMOVED
- Added sending Remove message for removed node to shut down itself
- Fixed a few bugs
- Removed 'remove' from Cluster and JMX interface
- Added bunch of ScalaDoc
- Added isRunning method

Signed-off-by: Jonas Bonér <jonas@jonasboner.com>
2012-06-16 00:00:19 +02:00
Patrik Nordwall
08c47591c0 Use max of periodic-tasks-initial-delay and the interval 2012-06-15 13:35:52 +02:00
Patrik Nordwall
f7a01505ba Correction of gossip merge when joining, see #2204
The problem:
* Node that is Up joins a cluster and becomes Joining in that cluster
* The joining node receives gossip, which results in conflict,
  merge results in Up
* It became Up in the new cluster without passing the ordinary leader
  action to move it to Up

The solution:
* Change priority order of Up and Joining so that Joining is used when
  merging
2012-06-15 13:35:52 +02:00
Jonas Bonér
f74c96b424 Merged with master 2012-06-14 16:21:03 +02:00
Jonas Bonér
cb0cfac6c7 Merged with master 2012-06-14 16:13:53 +02:00
Patrik Nordwall
c5164085b2 Merge branch 'master' into wip-2077-gossip-merge-patriknw
Conflicts:
	akka-cluster/src/main/scala/akka/cluster/Cluster.scala
2012-06-13 17:04:09 +02:00
Patrik Nordwall
391e633329 Improve docs based on feedback, see #2077 2012-06-13 16:54:21 +02:00
Patrik Nordwall
bd7bdff269 Improve debug log message of no convergence, see #2222 2012-06-13 16:15:16 +02:00
Patrik Nordwall
afbeb3e5f9 import MemberStatus._ 2012-06-13 15:33:38 +02:00
Patrik Nordwall
5b89d25c37 Add invariant assertions to Gossip, see #2077
* Add doc about how members are "moved"
2012-06-13 15:23:45 +02:00
Patrik Nordwall
f3d9f9c4e8 Merge seen table by starting with empty seen after merge, see #2077 2012-06-13 11:19:06 +02:00
Patrik Nordwall
ff5c99a80d Minor cleanup, based on review comments, see #2077 2012-06-13 11:04:27 +02:00
Patrik Nordwall
42c5281d5a Correct? implementation of merge and other actions, see #2077
* Merge unreachable using  highestPriorityOf
* Avoid merge result in node existing in both members and unreachable
* Fix joining only allowed when !alreadyMember && !isUnreachable (non Down)
* Fix filter bug of unreachable in downing and leaderActions
* Minor cleanups
2012-06-13 09:37:47 +02:00
Patrik Nordwall
92cab53b1e Rename + operator of VectorClock and Versioned to :+
* + is kind of reserved for string concatination
2012-06-12 16:16:44 +02:00
Patrik Nordwall
de1ad30217 Fix false convergence when singleton cluster, see #2222
* All members must be in seen table for convergence
* Added extra debug logging due to convergence issues
* Enabled test of convergence for node joining singleton
  cluster
2012-06-12 16:16:44 +02:00
Patrik Nordwall
40d9b27e73 Info log about dedicated scheduler, and refactoring, see #2214
* Refactoring with wrapping of Scheduler according to @viktorklang's wish
2012-06-12 14:16:30 +02:00
Patrik Nordwall
b27bae6554 Use dedicated cluster scheduler only when default scheduler resolution isn't good enough, see #2214
* Config properties for scheduler
* Commented shutdown considerations
2012-06-12 13:34:59 +02:00
Patrik Nordwall
a7d2be10eb Merge branch 'master' into wip-2214-heartbeats-patriknw
Conflicts:
	akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala
	akka-cluster/src/main/scala/akka/cluster/Cluster.scala
2012-06-11 22:27:08 +02:00
Patrik Nordwall
34c9e49ee0 Schedule cluster tasks with more accurate, see #2114
* Use scheduler with more accurate settings
* New FixedRateTask that compensates for inaccuracy
2012-06-11 22:20:44 +02:00
Patrik Nordwall
d957c68639 Incorporate feedback from review, see #2214 2012-06-11 21:12:57 +02:00
Patrik Nordwall
e2551494c4 Use Use separate heartbeats for FailureDetector, see #2214
* Send Heartbeat message to all members at regular interval
* Removed the need to gossip to myself
2012-06-11 15:00:44 +02:00
Jonas Bonér
ec7177be74 Misc fixes after FailureDetectorPuppet and abstraction review
- Moved FailureDetectorPuppet to its own file in src/test.
- Removed 'phi' method from FailureDetector public API.
- Throwing exception instead of falling back to default if we can't load the custom FD.
- Removed add-connection method in FailureDetectorPuppet.

Signed-off-by: Jonas Bonér <jonas@jonasboner.com>
2012-06-11 10:06:53 +02:00