Commit graph

75 commits

Author SHA1 Message Date
Patrik Nordwall
d5b25cbbc6 !act #3583 Timer based auto-down
* Replace (deprecate) akka.cluster.auto-down config setting with
  akka.cluster.auto-down-unreachable-after
* AutoDown actor that keeps track of unreachable members
  and performs down from the leader node when they have been
  unreachable for the specified duration
* Migration guide
2013-09-27 14:32:03 +02:00
Patrik Nordwall
dc9fe4f19c !clu #2307 Allow transition from unreachable to reachable
* Replace unreachable Set with Reachability table
* Unreachable members stay in member Set
* Downing a live member was moved it to the unreachable Set,
  and then removed from there by the leader. That will not
  work when flipping back to reachable, so a Down member must
  be detected as unreachable before beeing removed. Similar
  to Exiting. Member shuts down itself if it sees itself as
  Down.
* Flip back to reachable when failure detector monitors it as
  available again
* ReachableMember event
* Can't ignore gossip from aggregated unreachable (see SurviveNetworkInstabilitySpec)
* Make use of ReachableMember event in cluster router
* End heartbeat when acknowledged, EndHeartbeatAck
* Remove nr-of-end-heartbeats from conf
* Full reachability info in JMX cluster status
* Don't use interval after unreachable for AccrualFailureDetector history
* Add QuarantinedEvent to remoting, used for Reachability.Terminated
* Prune reachability table when all reachable
* Update documentation
* Performance testing and optimizations
2013-09-11 13:10:29 +02:00
Patrik Nordwall
8c2859ad03 Make akka.cluster.MetricsCollector public, see #3452 2013-06-18 15:07:26 +02:00
Patrik Nordwall
95366cb585 Wrap long lines, for pdf 2013-05-30 14:45:15 +02:00
Patrik Nordwall
a323936299 Disable cluster stats by default, see #3348
* Add VectorClockStats
2013-05-28 16:15:57 +02:00
Patrik Nordwall
18a3b3facf Config of cluster info logging, see #3225 2013-05-23 13:36:35 +02:00
drewhk
336fb1b180 Merge pull request #1450 from drewhk/wip-eventstream-for-fd-drewhk
EventStream is now passed to failure detectors
2013-05-17 09:40:12 -07:00
Endre Sándor Varga
fd4bc09035 EventStream is now passed to failure detectors
- Also, dynamic loading is now centralized (DRY)
2013-05-17 16:35:27 +02:00
Patrik Nordwall
ad1eaa6d4a Remove auto-join config, derive from seed-nodes, see #3359 2013-05-17 13:54:51 +02:00
Patrik Nordwall
9e56ab6fe5 Disallow re-joining, see #2873
* Disallow join requests when already part of a cluster
* Remove wipe state when joining, since join can only be
  performed from empty state
* When trying to join, only accept gossip from that member
* Ignore gossips from unknown (and unreachable) members
* Make sure received gossip contains selfAddress
* Test join of fresh node with same host:port
* Remove JoinTwoClustersSpec
* Welcome message as reply to Join
* Retry unsucessful join request
* AddressUidExtension
* Uid in cluster Member identifier
  To be able to distinguish nodes with same host:port
  after restart.
* Ignore gossip with wrong uid
* Renamed Remove command to Shutdown
* Use uid in vclock identifier
* Update sample, Member apply is private
* Disabled config duration syntax and cleanup of io settings
* Update documentation
2013-04-17 16:48:18 +02:00
Björn Antonsson
73f0f44ddb Protobuf serialization of cluster messages. See #1910 2013-04-11 10:09:05 +02:00
Patrik Nordwall
9f45dd90b7 Merge pull request #1304 from akka/wip-2839-dispatcher-deploy-config-patriknw
Add dispatcherId to deployment config, see #2839
2013-04-08 12:03:43 -07:00
Patrik Nordwall
ae0cb4a756 Add dispatcher to deployment config, see #2839 2013-04-08 09:52:56 +02:00
Viktor Klang
dddc3a6630 #3203 - deprecating HashedWheelTimer 2013-04-07 20:07:26 +02:00
Patrik Nordwall
7eac88f372 Cluster node roles, see #3049
* Config of node roles cluster.role
* Cluster router configurable with use-role
* RoleLeaderChanged event
* Cluster singleton per role
* Cluster only starts once all required per-role node
  counts are reached,
  role.<role-name>.min-nr-of-members config
*  Update documentation and make use of the roles in the examples
2013-03-18 11:56:11 +01:00
Björn Antonsson
7ed6b3d4ee Fixes according review. See #3076 2013-03-11 12:27:29 +01:00
Patrik Nordwall
323e5c80b5 Polish the API/SPI of remoting, see #2827
* Changed TransportAdapterProvider to support java impl
* Verified java impl of AbstractTransportAdapter and
  ActorTransportAdapter
* Privatized things that should not be public api
* Consistent usage of INTERNAL API marker in scaladoc
* Added some missing doc in conf
* Added missing SerialVersionUID
2013-02-10 17:47:43 +01:00
Patrik Nordwall
157a25bcde Failure detector refactoring, see #2690
* Failure detector was previously copied with refactoring to
  akka-remote and this refactoring makes use of that and removes
  the failure detector in akka-cluster
* Adjustments to reference.conf
* Refactoring of FailureDetectorPuppet
2013-02-01 10:08:39 +01:00
Patrik Nordwall
9dc124dacd Remove work-around for sending to broken connections, see #2909
* Previous work-around was introduced because Netty blocks when sending
to broken connections. This is supposed to be solved by the non-blocking
new remoting.
* Removed HeartbeatSender and CoreSender in cluster
* Added tests to verify that broken connections don't disturb live connection
2013-01-31 13:41:02 +01:00
Patrik Nordwall
8b4e903e7d Detect failure when no heartbeats sent, see #2907
* Subscribe to InstantMemberEvent and start heartbeating when
  InstantMemberUp. Same for metrics.
* HeartbeatNodeRing data structure for bidirectional mapping of
  heartbeat sender and receiver. Not using ConsistentHash anymore.
  Node addresses are hashed to ensure that neighbors are spread out.
* HeartbeatRequest when receiver detects that it has not received
  expected heartbeats.
* New test InitialHeartbeatSpec that simulates the problem
* Add/remove some related conf properties
* Add some more logging to be able to diagnose eventual problems
* Explicit config of nr-of-end-heartbeats
2013-01-18 12:54:09 +01:00
Patrik Nordwall
44ab9f116f min-nr-of-members and registerOnMemberUp, see #2306
* Leader moves joining members to up when min-nr-of-members reached
* Tested by MinMembersBeforeUpSpec
* Used in factorial sample
* Docs
2012-12-12 14:00:06 +01:00
Patrik Nordwall
5eec693fd0 Incorparate review feedback, see #2547
* case object and case class for MixMetricsSelector
* Rename decay-half-life-duration to moving-average-half-life
* Clarification of decay-half-life-duration and collect-interval
* Removed Fields, Java compatibility issue
* Adapt for-yield variables
* Comment metrics collector constructor that takes system param
* Don't copy EWMA if not needed
* LogOf2 constant 0.69315
* Don't use mapValues
* Remove RichInt conversion
* sigar version replace tag in docs
* createDeployer factory method to make it possible to override
  deployer in subclass
* Improve readability of MetricsListener (in sample)
* Better startup of factorial sample (no sleep)
* Many minor enhancements and cleanups
2012-11-16 11:03:20 +01:00
Patrik Nordwall
dcde7d3594 AdaptiveLoadBalancingRouter and more refactoring of metrics, see #2547
* Refactoring of standard metrics extractors and data structures
* Removed optional value in Metric, simplified a lot
* Configuration of EWMA by using half-life duration
* Renamed DataStream to EWMA
* Incorporate review feedback
* Use binarySearch for selecting weighted routees
* More metrics selectors for the router
* Removed network metrics, since not supported on linux
* Configuration of router
* Rename to AdaptiveLoadBalancingRouter
* Remove total cores metrics, since it's the same as jmx getAvailableProcessors,
  tested on intel 24 core server and amd 48 core server, and MBP
* API cleanup
* Java API additions
* Documentation of metrics and AdaptiveLoadBalancingRouter
* New cluster sample to illustrate metrics in the documentation,
  and play around with (factorial)
2012-11-14 15:08:30 +01:00
Patrik Nordwall
c9d206764a ClusterLoadBalancingRouter and refactoring of metrics, see #2547
* MetricsSelector, calculate capacity, weights and allocate weighted
  routee refs
* ClusterLoadBalancingRouterSpec
* Optional heap max
* Constants for the metric fields
* Refactoring of Metric and decay
* Rewrite of DataStreamSpec
* Correction of EWMA and removal of BigInt, BigDecimal
* Separation of MetricsCollector into trait and two classes,
  SigarMetricsCollector and JmxMetricsCollector
* This will reduce cost when sigar is not installed, such as
  avoiding throwing and catching exc for every call
* Improved error handling for loading sigar
* Made MetricsCollector implementation configurable
* Tested with sigar
2012-11-07 20:36:24 +01:00
Helena Edelson
f306964fca Initial work of adaptive metrics aware routers, see #2547 2012-11-07 18:18:38 +01:00
Patrik Nordwall
3f73705abc Use consistent hash to heartbeat to a few nodes instead of all, see #2284
* Previously heartbeat messages was sent to all other members, i.e.
  each member was monitored by all other members in the cluster.
* This was the number one know scalability bottleneck, due to the
  number of interconnections.
* Limit sending of heartbeats to a few (5) members. Select and
  re-balance with consistent hashing algorithm when new members
  are added or removed.
* Send a few EndHeartbeat when ending send of Heartbeat messages.
2012-10-08 08:41:28 +02:00
Björn Antonsson
0988101881 Fixes according to review. #2413 2012-10-02 10:21:46 +02:00
Björn Antonsson
08ef942242 Reformating configuration and examples for PDF (Scala, and leftovers). See #2413 2012-10-01 20:35:46 +02:00
Helena Edelson
dbce1c8b85 Cluster metrics internal API and cluster-wide transport of metrics data.
* Create Cluster Metrics API
* Create transport of relevant metrics data
Does not include load-balancing routers.
2012-09-24 13:07:11 -06:00
Patrik Nordwall
88a9123724 Move heartbeat-interval into failure-detector section 2012-09-20 15:23:18 +02:00
Patrik Nordwall
068335789c Cluster config setting to disable jmx, see #2531 2012-09-20 08:09:01 +02:00
Patrik Nordwall
0524ba0a65 Cluster documentation
* 2 Sample applications with main programs to play with,
  and multi-node tests to illustrate testing, and
  code snippets for inclusion in rst docs
* TransformationSample illustratates subscription to
  cluster events
* StatsSample illustrates usage of cluster aware routers, both
  lookup and deploy
2012-09-13 11:35:04 +02:00
Patrik Nordwall
ac16d567ea Improvements based on review comments from √, see #2103 2012-09-11 19:11:20 +02:00
Patrik Nordwall
018a949678 Merge branch 'master' into wip-2103-cluster-routers-patriknw
Conflicts:
	akka-cluster/src/multi-jvm/scala/akka/cluster/ClientDowningNodeThatIsUnreachableSpec.scala
	akka-cluster/src/multi-jvm/scala/akka/cluster/ClientDowningNodeThatIsUpSpec.scala
	akka-cluster/src/multi-jvm/scala/akka/cluster/ConvergenceSpec.scala
	akka-cluster/src/multi-jvm/scala/akka/cluster/LeaderDowningNodeThatIsUnreachableSpec.scala
	akka-cluster/src/multi-jvm/scala/akka/cluster/LeaderElectionSpec.scala
	akka-cluster/src/multi-jvm/scala/akka/cluster/MultiNodeClusterSpec.scala
	akka-cluster/src/multi-jvm/scala/akka/cluster/SingletonClusterSpec.scala
	akka-cluster/src/multi-jvm/scala/akka/cluster/SplitBrainSpec.scala
	akka-cluster/src/multi-jvm/scala/akka/cluster/UnreachableNodeRejoinsClusterSpec.scala
	akka-cluster/src/test/scala/akka/cluster/ClusterSpec.scala
	akka-remote/src/main/scala/akka/routing/RemoteRouterConfig.scala
2012-09-11 10:55:47 +02:00
Patrik Nordwall
f746115c4d Improve ClusterRouterSettings, see #2103
* It doesn't make sense to combine lookup with routeesPath with
  maxInstancesPerNode != 1
* Rename deploy-on-own-node as it can be used for lookup also
2012-09-08 17:30:42 +02:00
Patrik Nordwall
bf1781c173 Lookup of cluster routees, see #2103
* Lookup of routee on cluster member node using configured routees-path
2012-09-07 14:54:53 +02:00
Patrik Nordwall
d552e06a07 Add deploy-on-own-node setting for cluster router, see #2103
* Useful for master-worker scenario where all routees are remote.
2012-09-07 12:07:41 +02:00
Patrik Nordwall
bd6c39178c Fix leaking this in constructor of Cluster, see #2473
* Major refactoring to remove the need to use special
  Cluster instance for testing. Use default Cluster
  extension instead. Most of it is trivial changes.
* Used failure-detector.implementation-class from config
  to swap to Puppet
* Removed FailureDetectorStrategy, since it doesn't add any value
* Added Cluster.joinSeedNodes to be able to test seedNodes when Addresses
  are unknown before startup time.
* Removed ClusterEnvironment that was passed around among the actors,
  instead they use the ordinary Cluster extension.
* Overall much cleaner design
2012-09-06 21:48:40 +02:00
Patrik Nordwall
695ce49727 Deploy to new members in cluster, see #2103
* Config max-nr-of-instances-per-node
* selectDeploymentTarget that takes max-nr-of-instances-per-node
  and nr-of-instances into account
* Deploy when new member added or removed
* Moved routeeProps to RouteeProvider constructor, needed for
  this feature, but also simplifies createRoute, createRoutee,
  and resize, since routeeProps doesn't have to be passed around.
2012-08-29 19:33:19 +02:00
Patrik Nordwall
417bdc2dfb Prototype of cluster aware routers, see #2103
* Several FIXME that needs to be discussed
* ClusterRouterConfig created via ClusterActorRefProvider
2012-08-28 11:54:52 +02:00
Patrik Nordwall
06f81f4373 Improve publish of domain events, see #2202
* Gossip is not exposed in user api
* Better and more events
* Snapshot event sent to new subscriber
* Updated tests
* Periodic publish only for internal stats
2012-08-15 16:47:34 +02:00
Patrik Nordwall
fbeb6017cc Remove gossip to deputy nodes, see #2310 2012-07-04 14:39:27 +02:00
Patrik Nordwall
c708d2ad8a First step in refactoring of cluster internals to actors, see #2311
* Move clustering code to ClusterCore actor
* More will be done, comitting this for early review
2012-07-04 13:52:14 +02:00
Patrik Nordwall
9bf5a74f92 Merge branch 'master' into wip-2290-leader-merge-patriknw
Conflicts:
	akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala
2012-07-04 11:46:28 +02:00
Patrik Nordwall
63d6ac2a7e Change auto-down default to off, see #2304 2012-07-03 16:36:11 +02:00
Patrik Nordwall
e5979bc31c Gossip merge in large cluster, #2290
* Trying to simultaneously resolving conflicts at several nodes creates new conflicts.
  Therefore the leader resolves conflicts to limit divergence. To avoid overload there
  is also a configurable rate limit of how many conflicts that are handled by second.
* Netty blocks when sending to broken connections. ClusterHeartbeatSender actor
  isolates sending to different nodes by using child workers for each target
  address and thereby reduce the risk of irregular heartbeats to healty
  nodes due to broken connections to other nodes.
2012-07-02 23:00:41 +02:00
Patrik Nordwall
c09caebe8a Small refactoring of cluster actors
* Separate actor for heartbeats, so they are more isolated from gossip
  messages
* Configuration property for dispatcher to use for the cluster actors
2012-07-02 23:00:41 +02:00
Patrik Nordwall
d47ff04c03 Moved GossipDifferentViewProbability to config, see #2253 2012-06-29 08:56:58 +02:00
Patrik Nordwall
133e1561c1 Adjusted comment based on feedback, see #2265 2012-06-27 10:58:48 +02:00
Patrik Nordwall
932ea6f98a Test split brain scenario, see #2265 2012-06-26 18:25:42 +02:00