pekko

Author	SHA1	Message	Date
Veiga Ortiz, Héctor	c08bc317e2	+clu #13584 Accept joining to be WeaklyUp during network split * experimental feature, disabled by default * Adding documentation to mention weakly up members. plus adding new diagram.	2015-09-04 12:44:47 +02:00
Roland Kuhn	18688fc84b	= #17380 fix doc comments for java8 doclint * actor and cluster-metrics comments * agent/camel/cluster/osgi/persistence/remote comments * comments in contrib/persistence-tck/multi-node/typed	2015-05-18 12:51:36 +02:00
Patrik Nordwall	c991d5f1d1	=str #17200 Stop shard region when MemberRemoved Two issues: 1) ShardRegion actor must stop itself when the node is shutting down, ie. when receiving MemberRemoved(selfAddress) 2) ShardCoordinator must not persist anything when the node is shutting down. MemberRemoved of other shard regions will trigger Terminated, which must not be persisted, because then the next coordinator will replay those events and end up in wrong state. This is a problem announced itself when using leaving as illustrated in the new test. To solve the second issue I have added a new ClusterShuttingDown event that is published before the MemberRemoved events. Note that Terminated is triggered by MemberRemoved. (cherry picked from commit 1b272c72597beece9d93f0054f4b58e3d25f9ae2)	2015-04-22 12:46:30 +02:00
Patrik Nordwall	fe98dae650	=clu #13875 Fix regression in leader selection * The leader is selected by picking the first reachable member, but in #13875 we had to let the self member be unreachable in the Reachability table and that was not considered in the logic of the leader selection. * That means changed behavior that is unwanted, especially when there is only one node left the leader could be evaluated to None instead of Some(selfUniqueAddress). * Note that #13875 has not been released yet.	2015-03-14 11:41:28 -07:00
Julian Tescher	00f6a58e7c	Changes all occurances of Typesafe copyright to extend to 2015	2015-03-10 14:12:19 -07:00
Patrik Nordwall	71ccb4c21b	=clu #13875 Exclude unreachability observations from downed * Skip observations from downed node (quarantined is marked down immediately) in convergence check * Skip observations from downed node when picking "reachable" targets for gossip. * This also means that we must accept gossip with own node marked as unreachable, but that should not be spread to the external membership events.	2015-02-06 10:19:48 +01:00
Andrei Pozolotin	7b9f77a073	+ akka-cluster-metrics: new akka module * new akka module split from akka-cluster * provide sigar provisioning * fix ewma usage * resolve #16121 * see #16354	2015-01-19 10:23:54 -06:00
Patrik Nordwall	503c4ced8f	!clu #3920 Remove deprecated Cluster.publishCurrentClusterState	2014-03-14 14:11:28 +01:00
dario.rexin	2cbad298d6	=all #3858 Make case classes final	2014-03-07 13:20:01 +01:00
Adam Voss	cce29dfa51	Changes all occurances of Typesafe copyright to extend to 2014.	2014-02-04 21:20:09 -06:00
Patrik Nordwall	2e5193347e	!clu #3617 API improvements related to CurrentClusterState * Getter for CurrentClusterState in Cluster extension, updated via ClusterReadView * Remove lazy init of readView. Otherwise the cluster.state will be empty on first access, wich is probably surprising * Subscribe to several cluster event types at once, to ensure one CurrentClusterEvent followed by change events * Deprecate publishCurrentClusterState, was a bad idea, use sendCurrentClusterState instead * Possibility to subscribe with InitialStateAsEvents to receive events corresponding to CurrentClusterState * CurrentClusterState not a ClusterDomainEvent, ticket #3614	2014-01-16 16:17:44 +01:00
Patrik Nordwall	dc9fe4f19c	!clu #2307 Allow transition from unreachable to reachable * Replace unreachable Set with Reachability table * Unreachable members stay in member Set * Downing a live member was moved it to the unreachable Set, and then removed from there by the leader. That will not work when flipping back to reachable, so a Down member must be detected as unreachable before beeing removed. Similar to Exiting. Member shuts down itself if it sees itself as Down. * Flip back to reachable when failure detector monitors it as available again * ReachableMember event * Can't ignore gossip from aggregated unreachable (see SurviveNetworkInstabilitySpec) * Make use of ReachableMember event in cluster router * End heartbeat when acknowledged, EndHeartbeatAck * Remove nr-of-end-heartbeats from conf * Full reachability info in JMX cluster status * Don't use interval after unreachable for AccrualFailureDetector history * Add QuarantinedEvent to remoting, used for Reachability.Terminated * Prune reachability table when all reachable * Update documentation * Performance testing and optimizations	2013-09-11 13:10:29 +02:00
Patrik Nordwall	a323936299	Disable cluster stats by default, see #3348 * Add VectorClockStats	2013-05-28 16:15:57 +02:00
Patrik Nordwall	ee6e80d31a	Add previousStatus in MemberRemoved, see #3252	2013-05-23 11:09:32 +02:00
Patrik Nordwall	a0a0f39613	Hardening of cluster member leaving path, see #3309 * Removed leader commands for Shutdown and Exit * Member shutdown itself when it sees itself as Exiting * Singleton cluster with status Exiting will shutdown itself, in case the Exiting gossip never arrives * Exiting member not part convergence check * Exiting member is removed by leader (on convergence) when the exiting member is in the unreachable set, i.e. sucessfully shutdown * Reverted the change made for #3266, i.e. Exiting is detected as unreachable again. * Adjust ClusterSingletonManager to new Exiting behaviour * Fix bug in HeartbeatSender, which caused it to continue to send heartbeats to removed nodes, instead of rebalancing * Refactoring of leaderActions method * Leaving section in docs	2013-05-17 11:39:49 +02:00
Björn Antonsson	539df2e98a	Enforce mailbox types on System actors. See #3273	2013-05-03 11:05:32 +02:00
Patrik Nordwall	4606612bd1	Reliable remote supervision and death watch, see #2993 * RemoteWatcher that monitors node failures, with heartbeats and failure detector * Move RemoteDeploymentWatcher from CARP to RARP * ClusterRemoteWatcher that handles cluster nodes * Update documentation * UID in Heartbeat msg to be able to quarantine, actual implementation of quarantining will be implemented in ticket 2594	2013-04-17 19:42:51 +02:00
Patrik Nordwall	9e56ab6fe5	Disallow re-joining, see #2873 * Disallow join requests when already part of a cluster * Remove wipe state when joining, since join can only be performed from empty state * When trying to join, only accept gossip from that member * Ignore gossips from unknown (and unreachable) members * Make sure received gossip contains selfAddress * Test join of fresh node with same host:port * Remove JoinTwoClustersSpec * Welcome message as reply to Join * Retry unsucessful join request * AddressUidExtension * Uid in cluster Member identifier To be able to distinguish nodes with same host:port after restart. * Ignore gossip with wrong uid * Renamed Remove command to Shutdown * Use uid in vclock identifier * Update sample, Member apply is private * Disabled config duration syntax and cleanup of io settings * Update documentation	2013-04-17 16:48:18 +02:00
Björn Antonsson	73f0f44ddb	Protobuf serialization of cluster messages. See #1910	2013-04-11 10:09:05 +02:00
Patrik Nordwall	7eac88f372	Cluster node roles, see #3049 * Config of node roles cluster.role * Cluster router configurable with use-role * RoleLeaderChanged event * Cluster singleton per role * Cluster only starts once all required per-role node counts are reached, role.<role-name>.min-nr-of-members config * Update documentation and make use of the roles in the examples	2013-03-18 11:56:11 +01:00
Patrik Nordwall	1e4b2585c7	Publish LeaderChanged when first seen, see #3131 * The problem in ClusterSingletonManagerChaosSpec was that node 4 doesn't publish LeaderChanged, because there is never convergence on node 4 of the new Up state for the three new nodes before they are shutdown. When it becomes convergence on node 4 prevConvergedGossip and newGossip have same leader (i.e. no change). * LeaderChanged is now published when the new leader is first seen, i.e. same as member events. This makes sense now when leader can't be in Joining state.	2013-03-11 12:41:15 +01:00
Patrik Nordwall	5b844ec1e6	Publish member events when state change first seen, see #3075 * Remove InstantMemberEvent	2013-03-07 14:07:17 +01:00
Patrik Nordwall	5c7747e7fa	Transition from Down to Removed, see #3075	2013-03-07 14:02:42 +01:00
Roland	bcfbea42c1	fix formatting of Java API in doc comments + genjavadoc 0.3	2013-03-07 09:05:55 +01:00
Patrik Nordwall	cab78e5174	Make cluster fault handling more robust, see #3030 * ClusterCoreDaemon and ClusterDomainEventPublisher can't be restarted because the state would be obsolete. * Add extra supervisor level for ClusterCoreDaemon and ClusterDomainEventPublisher, which will shutdown the member on failure in children. * Publish the final removed state on postStop in ClusterDomainEventPublisher. This also simplifies the removing process.	2013-02-12 21:55:08 +01:00
Patrik Nordwall	d32a2edc51	Buffer LeaderChanged events and publish all on convergence, see #3017 * Otherwise some changes might never be published, since it doesn't have to be convergence on all nodes inbetween all transitions. * Detected by a failure ClusterSingletonManagerSpec. * Added a test to simulate the failure scenario.	2013-02-08 12:29:11 +01:00
Patrik Nordwall	79303a1785	Incorparate review comments, see #2803	2013-01-14 19:32:52 +01:00
Patrik Nordwall	d07f331e78	Publish InstantMemberEvent immediately, see #2803	2013-01-14 19:13:48 +01:00
Viktor Klang (√)	6b638db65e	Merge pull request #1006 from akka/wip-2879-copyright2013-√ #2879 - updating copyright info	2013-01-14 04:59:29 -08:00
Viktor Klang	adfeb2c1f0	#2879 - updating copyright info	2013-01-09 11:38:00 +01:00
Patrik Nordwall	943c438d5e	Publish clean state when joining (PublishStart), see #2871 * The failure in JoinTwoClustersSpec was due to missing publishing of cluster events when clearing current state when joining * This fix is in the right direction, but joining clusters like this will need some design thought, creating ticket 2873 for that	2013-01-08 19:32:36 +01:00
Patrik Nordwall	f147f4d3d2	Stress / long running test of cluster, see #2786 * akka.cluster.StressSpec * Configurable number of nodes and duration for each step * Report metrics and phi periodically to see progress * Configurable payload size * Test of various join and remove scenarios * Test of watch * Exercise supervision * Report cluster stats * Test with many actors in tree structure Apart from the test this commit also solves some issues: * Avoid adding back members when downed in ClusterHeartbeatSender * Avoid duplicate close of ClusterReadView * Add back the publish of AddressTerminated when MemberDowned/Removed it was lost in merge of "publish on convergence", see #2779	2013-01-07 14:44:36 +01:00
Björn Antonsson	a03460329d	Change cluster MemberEvents to only be published on convergence. See #2692 Conflicts: akka-cluster/src/main/scala/akka/cluster/ClusterEvent.scala akka-cluster/src/main/scala/akka/cluster/ClusterJmx.scala akka-cluster/src/main/scala/akka/cluster/ClusterMetricsCollector.scala akka-cluster/src/main/scala/akka/cluster/ClusterReadView.scala akka-cluster/src/multi-jvm/scala/akka/cluster/MultiNodeClusterSpec.scala akka-docs/rst/cluster/cluster-usage-java.rst akka-docs/rst/cluster/cluster-usage-scala.rst akka-kernel/src/main/dist/bin/akka-cluster	2012-12-14 12:46:13 +01:00
Patrik Nordwall	1cd3a05f41	Publish AddressTerminated after a member is Downed/Removed, see #2779 * Instead of when unreachable * Note that ClusterRouterConfig is not changed, i.e. routees will be removed when unreachable * Routers that are not wrapped by ClusterRouterConfig will watch as usual, i.e. remove routees when Terminated, i.e. node down	2012-12-12 12:55:22 +01:00
Patrik Nordwall	1914be7069	Merge branch 'master' into wip-2547-metrics-router-patriknw Conflicts: akka-actor/src/main/scala/akka/actor/Deployer.scala akka-cluster/src/main/scala/akka/cluster/ClusterMetricsCollector.scala akka-cluster/src/test/scala/akka/cluster/MetricsCollectorSpec.scala	2012-11-15 12:33:11 +01:00
Patrik Nordwall	dcde7d3594	AdaptiveLoadBalancingRouter and more refactoring of metrics, see #2547 * Refactoring of standard metrics extractors and data structures * Removed optional value in Metric, simplified a lot * Configuration of EWMA by using half-life duration * Renamed DataStream to EWMA * Incorporate review feedback * Use binarySearch for selecting weighted routees * More metrics selectors for the router * Removed network metrics, since not supported on linux * Configuration of router * Rename to AdaptiveLoadBalancingRouter * Remove total cores metrics, since it's the same as jmx getAvailableProcessors, tested on intel 24 core server and amd 48 core server, and MBP * API cleanup * Java API additions * Documentation of metrics and AdaptiveLoadBalancingRouter * New cluster sample to illustrate metrics in the documentation, and play around with (factorial)	2012-11-14 15:08:30 +01:00
Patrik Nordwall	c959d4a973	Incorporate feedback, see #2502	2012-10-05 08:17:54 +02:00
Patrik Nordwall	acdafa0cd3	Additions for Java API of cluster, see #2502	2012-10-04 14:16:11 +02:00
Patrik Nordwall	49b9ec6c2c	Publish cluster metrics through the publisher actor. * To avoid ordering surprises metrics should be published via the same actor that handles the subscriptions and publishes other cluster domain events. * Added missing publish in case of removal of member (had a test failure for that)	2012-10-02 17:08:38 +02:00
Patrik Nordwall	51ff9ce6d1	Cluster.unsubscribe with class parameter, see #2567	2012-09-28 13:09:36 +02:00
Helena Edelson	dbce1c8b85	Cluster metrics internal API and cluster-wide transport of metrics data. * Create Cluster Metrics API * Create transport of relevant metrics data Does not include load-balancing routers.	2012-09-24 13:07:11 -06:00
Patrik Nordwall	9423d37da9	Merge branch 'master' into wip-cluster-docs-patriknw Conflicts: project/AkkaBuild.scala	2012-09-20 10:40:08 +02:00
Patrik Nordwall	ab8a690c65	Use Either for LeaderChanged state, see #2518	2012-09-20 08:44:44 +02:00
Patrik Nordwall	718686e2f2	Add another test case for publish of LeaderChanged, see #2518 * It didn't handle convergence changes with same leader correctly	2012-09-19 10:18:55 +02:00
Patrik Nordwall	c0c6cc3931	Publish cluster LeaderChanged only when convergence, see #2518	2012-09-18 14:19:38 +02:00
Patrik Nordwall	50d0efe7d4	Request send/publish of CurrentClusterState, see #2438 * Added publishCurrentClusterState and sendCurrentClusterState * Removed Ping/Pong that was used for some tests, since awaitCond is now needed anyway, since publish to eventStream is done afterwards	2012-09-12 09:23:02 +02:00
Patrik Nordwall	911ef6b97e	Merge pull request #668 from akka/wip-1588-cluster-death-watch-patriknw Death watch hooked up with cluster failure detector, see #1588	2012-09-11 06:13:44 -07:00
Patrik Nordwall	bd6c39178c	Fix leaking this in constructor of Cluster, see #2473 * Major refactoring to remove the need to use special Cluster instance for testing. Use default Cluster extension instead. Most of it is trivial changes. * Used failure-detector.implementation-class from config to swap to Puppet * Removed FailureDetectorStrategy, since it doesn't add any value * Added Cluster.joinSeedNodes to be able to test seedNodes when Addresses are unknown before startup time. * Removed ClusterEnvironment that was passed around among the actors, instead they use the ordinary Cluster extension. * Overall much cleaner design	2012-09-06 21:48:40 +02:00
Patrik Nordwall	6b40ddc755	Maintain AddressTerminated subscription in DeathWatch, see #1588	2012-09-03 20:37:33 +02:00
Patrik Nordwall	b1e251e0bc	Prototype of death watch hooked up with failure detector, see #1588 * Probably a lot of things missing, but wanted to try the first idea * The test is green :)	2012-08-31 16:37:35 +02:00

1 2

51 commits