Commit graph

45 commits

Author SHA1 Message Date
Andrei Pozolotin
7b9f77a073 + akka-cluster-metrics: new akka module
* new akka module split from akka-cluster
* provide sigar provisioning
* fix ewma usage
* resolve #16121
* see #16354
2015-01-19 10:23:54 -06:00
Patrik Nordwall
503c4ced8f !clu #3920 Remove deprecated Cluster.publishCurrentClusterState 2014-03-14 14:11:28 +01:00
dario.rexin
2cbad298d6 =all #3858 Make case classes final 2014-03-07 13:20:01 +01:00
Adam Voss
cce29dfa51 Changes all occurances of Typesafe copyright to extend to 2014. 2014-02-04 21:20:09 -06:00
Patrik Nordwall
2e5193347e !clu #3617 API improvements related to CurrentClusterState
* Getter for CurrentClusterState in Cluster extension, updated via
  ClusterReadView
* Remove lazy init of readView. Otherwise the cluster.state will be
  empty on first access, wich is probably surprising
* Subscribe to several cluster event types at once, to ensure *one*
  CurrentClusterEvent followed by change events
* Deprecate publishCurrentClusterState, was a bad idea, use sendCurrentClusterState
  instead
* Possibility to subscribe with InitialStateAsEvents to receive events corresponding
  to CurrentClusterState
* CurrentClusterState not a ClusterDomainEvent, ticket #3614
2014-01-16 16:17:44 +01:00
Patrik Nordwall
dc9fe4f19c !clu #2307 Allow transition from unreachable to reachable
* Replace unreachable Set with Reachability table
* Unreachable members stay in member Set
* Downing a live member was moved it to the unreachable Set,
  and then removed from there by the leader. That will not
  work when flipping back to reachable, so a Down member must
  be detected as unreachable before beeing removed. Similar
  to Exiting. Member shuts down itself if it sees itself as
  Down.
* Flip back to reachable when failure detector monitors it as
  available again
* ReachableMember event
* Can't ignore gossip from aggregated unreachable (see SurviveNetworkInstabilitySpec)
* Make use of ReachableMember event in cluster router
* End heartbeat when acknowledged, EndHeartbeatAck
* Remove nr-of-end-heartbeats from conf
* Full reachability info in JMX cluster status
* Don't use interval after unreachable for AccrualFailureDetector history
* Add QuarantinedEvent to remoting, used for Reachability.Terminated
* Prune reachability table when all reachable
* Update documentation
* Performance testing and optimizations
2013-09-11 13:10:29 +02:00
Patrik Nordwall
a323936299 Disable cluster stats by default, see #3348
* Add VectorClockStats
2013-05-28 16:15:57 +02:00
Patrik Nordwall
ee6e80d31a Add previousStatus in MemberRemoved, see #3252 2013-05-23 11:09:32 +02:00
Patrik Nordwall
a0a0f39613 Hardening of cluster member leaving path, see #3309
* Removed leader commands for Shutdown and Exit
* Member shutdown itself  when it sees itself as Exiting
* Singleton cluster with status Exiting will shutdown itself,
  in case the Exiting gossip never arrives
* Exiting member not part convergence check
* Exiting member is removed by leader (on convergence) when the
  exiting member is in the unreachable set, i.e. sucessfully shutdown
* Reverted the change made for #3266, i.e. Exiting is
  detected as unreachable again.
* Adjust ClusterSingletonManager to new Exiting behaviour
* Fix bug in HeartbeatSender, which caused it to continue to
  send heartbeats to removed nodes, instead of rebalancing
* Refactoring of leaderActions method
* Leaving section in docs
2013-05-17 11:39:49 +02:00
Björn Antonsson
539df2e98a Enforce mailbox types on System actors. See #3273 2013-05-03 11:05:32 +02:00
Patrik Nordwall
4606612bd1 Reliable remote supervision and death watch, see #2993
* RemoteWatcher that monitors node failures, with heartbeats
  and failure detector
* Move RemoteDeploymentWatcher from CARP to RARP
* ClusterRemoteWatcher that handles cluster nodes
* Update documentation
* UID in Heartbeat msg to be able to quarantine,
  actual implementation of quarantining will be implemented
  in ticket 2594
2013-04-17 19:42:51 +02:00
Patrik Nordwall
9e56ab6fe5 Disallow re-joining, see #2873
* Disallow join requests when already part of a cluster
* Remove wipe state when joining, since join can only be
  performed from empty state
* When trying to join, only accept gossip from that member
* Ignore gossips from unknown (and unreachable) members
* Make sure received gossip contains selfAddress
* Test join of fresh node with same host:port
* Remove JoinTwoClustersSpec
* Welcome message as reply to Join
* Retry unsucessful join request
* AddressUidExtension
* Uid in cluster Member identifier
  To be able to distinguish nodes with same host:port
  after restart.
* Ignore gossip with wrong uid
* Renamed Remove command to Shutdown
* Use uid in vclock identifier
* Update sample, Member apply is private
* Disabled config duration syntax and cleanup of io settings
* Update documentation
2013-04-17 16:48:18 +02:00
Björn Antonsson
73f0f44ddb Protobuf serialization of cluster messages. See #1910 2013-04-11 10:09:05 +02:00
Patrik Nordwall
7eac88f372 Cluster node roles, see #3049
* Config of node roles cluster.role
* Cluster router configurable with use-role
* RoleLeaderChanged event
* Cluster singleton per role
* Cluster only starts once all required per-role node
  counts are reached,
  role.<role-name>.min-nr-of-members config
*  Update documentation and make use of the roles in the examples
2013-03-18 11:56:11 +01:00
Patrik Nordwall
1e4b2585c7 Publish LeaderChanged when first seen, see #3131
* The problem in ClusterSingletonManagerChaosSpec was that node 4 doesn't publish
  LeaderChanged, because there is never convergence on node 4 of the new Up
  state for the three new nodes before they are shutdown. When it becomes
  convergence on node 4 prevConvergedGossip and newGossip have same leader
  (i.e. no change).
* LeaderChanged is now published when the new leader is first seen, i.e. same
  as member events. This makes sense now when leader can't be in Joining state.
2013-03-11 12:41:15 +01:00
Patrik Nordwall
5b844ec1e6 Publish member events when state change first seen, see #3075
* Remove InstantMemberEvent
2013-03-07 14:07:17 +01:00
Patrik Nordwall
5c7747e7fa Transition from Down to Removed, see #3075 2013-03-07 14:02:42 +01:00
Roland
bcfbea42c1 fix formatting of Java API in doc comments + genjavadoc 0.3 2013-03-07 09:05:55 +01:00
Patrik Nordwall
cab78e5174 Make cluster fault handling more robust, see #3030
* ClusterCoreDaemon and ClusterDomainEventPublisher can't be restarted
  because the state would be obsolete.
* Add extra supervisor level for ClusterCoreDaemon and
  ClusterDomainEventPublisher, which will shutdown the member
  on failure in children.
* Publish the final removed state on postStop in
  ClusterDomainEventPublisher. This also simplifies the removing
  process.
2013-02-12 21:55:08 +01:00
Patrik Nordwall
d32a2edc51 Buffer LeaderChanged events and publish all on convergence, see #3017
* Otherwise some changes might never be published, since it doesn't have
  to be convergence on all nodes inbetween all transitions.
* Detected by a failure ClusterSingletonManagerSpec.
* Added a test to simulate the failure scenario.
2013-02-08 12:29:11 +01:00
Patrik Nordwall
79303a1785 Incorparate review comments, see #2803 2013-01-14 19:32:52 +01:00
Patrik Nordwall
d07f331e78 Publish InstantMemberEvent immediately, see #2803 2013-01-14 19:13:48 +01:00
Viktor Klang (√)
6b638db65e Merge pull request #1006 from akka/wip-2879-copyright2013-√
#2879 - updating copyright info
2013-01-14 04:59:29 -08:00
Viktor Klang
adfeb2c1f0 #2879 - updating copyright info 2013-01-09 11:38:00 +01:00
Patrik Nordwall
943c438d5e Publish clean state when joining (PublishStart), see #2871
* The failure in JoinTwoClustersSpec was due to missing publishing
  of cluster events when clearing current state when joining
* This fix is in the right direction, but joining clusters like this
  will need some design thought, creating ticket 2873 for that
2013-01-08 19:32:36 +01:00
Patrik Nordwall
f147f4d3d2 Stress / long running test of cluster, see #2786
* akka.cluster.StressSpec
* Configurable number of nodes and duration for each step
* Report metrics and phi periodically to see progress
* Configurable payload size
* Test of various join and remove scenarios
* Test of watch
* Exercise supervision
* Report cluster stats
* Test with many actors in tree structure

Apart from the test this commit also solves some issues:

* Avoid adding back members when downed in ClusterHeartbeatSender
* Avoid duplicate close of ClusterReadView
* Add back the publish of AddressTerminated when MemberDowned/Removed
  it was lost in merge of "publish on convergence", see #2779
2013-01-07 14:44:36 +01:00
Björn Antonsson
a03460329d Change cluster MemberEvents to only be published on convergence. See #2692
Conflicts:
	akka-cluster/src/main/scala/akka/cluster/ClusterEvent.scala
	akka-cluster/src/main/scala/akka/cluster/ClusterJmx.scala
	akka-cluster/src/main/scala/akka/cluster/ClusterMetricsCollector.scala
	akka-cluster/src/main/scala/akka/cluster/ClusterReadView.scala
	akka-cluster/src/multi-jvm/scala/akka/cluster/MultiNodeClusterSpec.scala
	akka-docs/rst/cluster/cluster-usage-java.rst
	akka-docs/rst/cluster/cluster-usage-scala.rst
	akka-kernel/src/main/dist/bin/akka-cluster
2012-12-14 12:46:13 +01:00
Patrik Nordwall
1cd3a05f41 Publish AddressTerminated after a member is Downed/Removed, see #2779
* Instead of when unreachable

* Note that ClusterRouterConfig is not changed, i.e. routees will be removed
  when unreachable
* Routers that are not wrapped by ClusterRouterConfig will watch as usual, i.e.
  remove routees when Terminated, i.e. node down
2012-12-12 12:55:22 +01:00
Patrik Nordwall
1914be7069 Merge branch 'master' into wip-2547-metrics-router-patriknw
Conflicts:
	akka-actor/src/main/scala/akka/actor/Deployer.scala
	akka-cluster/src/main/scala/akka/cluster/ClusterMetricsCollector.scala
	akka-cluster/src/test/scala/akka/cluster/MetricsCollectorSpec.scala
2012-11-15 12:33:11 +01:00
Patrik Nordwall
dcde7d3594 AdaptiveLoadBalancingRouter and more refactoring of metrics, see #2547
* Refactoring of standard metrics extractors and data structures
* Removed optional value in Metric, simplified a lot
* Configuration of EWMA by using half-life duration
* Renamed DataStream to EWMA
* Incorporate review feedback
* Use binarySearch for selecting weighted routees
* More metrics selectors for the router
* Removed network metrics, since not supported on linux
* Configuration of router
* Rename to AdaptiveLoadBalancingRouter
* Remove total cores metrics, since it's the same as jmx getAvailableProcessors,
  tested on intel 24 core server and amd 48 core server, and MBP
* API cleanup
* Java API additions
* Documentation of metrics and AdaptiveLoadBalancingRouter
* New cluster sample to illustrate metrics in the documentation,
  and play around with (factorial)
2012-11-14 15:08:30 +01:00
Patrik Nordwall
c959d4a973 Incorporate feedback, see #2502 2012-10-05 08:17:54 +02:00
Patrik Nordwall
acdafa0cd3 Additions for Java API of cluster, see #2502 2012-10-04 14:16:11 +02:00
Patrik Nordwall
49b9ec6c2c Publish cluster metrics through the publisher actor.
* To avoid ordering surprises metrics should be published via
  the same actor that handles the subscriptions and publishes
  other cluster domain events.
* Added missing publish in case of removal of member
  (had a test failure for that)
2012-10-02 17:08:38 +02:00
Patrik Nordwall
51ff9ce6d1 Cluster.unsubscribe with class parameter, see #2567 2012-09-28 13:09:36 +02:00
Helena Edelson
dbce1c8b85 Cluster metrics internal API and cluster-wide transport of metrics data.
* Create Cluster Metrics API
* Create transport of relevant metrics data
Does not include load-balancing routers.
2012-09-24 13:07:11 -06:00
Patrik Nordwall
9423d37da9 Merge branch 'master' into wip-cluster-docs-patriknw
Conflicts:
	project/AkkaBuild.scala
2012-09-20 10:40:08 +02:00
Patrik Nordwall
ab8a690c65 Use Either for LeaderChanged state, see #2518 2012-09-20 08:44:44 +02:00
Patrik Nordwall
718686e2f2 Add another test case for publish of LeaderChanged, see #2518
* It didn't handle convergence changes with same leader correctly
2012-09-19 10:18:55 +02:00
Patrik Nordwall
c0c6cc3931 Publish cluster LeaderChanged only when convergence, see #2518 2012-09-18 14:19:38 +02:00
Patrik Nordwall
50d0efe7d4 Request send/publish of CurrentClusterState, see #2438
* Added publishCurrentClusterState and sendCurrentClusterState
* Removed Ping/Pong that was used for some tests, since awaitCond is
  now needed anyway, since publish to eventStream is done afterwards
2012-09-12 09:23:02 +02:00
Patrik Nordwall
911ef6b97e Merge pull request #668 from akka/wip-1588-cluster-death-watch-patriknw
Death watch hooked up with cluster failure detector, see #1588
2012-09-11 06:13:44 -07:00
Patrik Nordwall
bd6c39178c Fix leaking this in constructor of Cluster, see #2473
* Major refactoring to remove the need to use special
  Cluster instance for testing. Use default Cluster
  extension instead. Most of it is trivial changes.
* Used failure-detector.implementation-class from config
  to swap to Puppet
* Removed FailureDetectorStrategy, since it doesn't add any value
* Added Cluster.joinSeedNodes to be able to test seedNodes when Addresses
  are unknown before startup time.
* Removed ClusterEnvironment that was passed around among the actors,
  instead they use the ordinary Cluster extension.
* Overall much cleaner design
2012-09-06 21:48:40 +02:00
Patrik Nordwall
6b40ddc755 Maintain AddressTerminated subscription in DeathWatch, see #1588 2012-09-03 20:37:33 +02:00
Patrik Nordwall
b1e251e0bc Prototype of death watch hooked up with failure detector, see #1588
* Probably a lot of things missing, but wanted to try the first idea
* The test is green :)
2012-08-31 16:37:35 +02:00
Patrik Nordwall
20a038fdfd Fine grained events, see #2202
* Defined the domain events in ClusterEvent.scala file
* Produce events from diff  and publish publish to event bus
  from separate actor, ClusterDomainEventPublisher
* Adjustments of tests
2012-08-21 15:35:38 +02:00