pekko

Author	SHA1	Message	Date
Johan Andrén	9c7e8d027a	Renamed/moved the self data center setting #23312 (#23344 )	2017-07-12 11:47:32 +01:00
Johan Andrén	9f4da87840	=clu #23286 filter emitted reachability event by DC	2017-07-07 16:50:36 +01:00
Johan Andrén	c0d439eac3	limit cross dc gossip #23282	2017-07-07 13:19:10 +01:00
Patrik Nordwall	867cc97bdd	Refactoring of Gossip class, #23290 * move methods that depends on selfUniqueAddress and selfDc to a separate MembershipState class, which also holds the latest gossip * this removes the need to pass in the parameters from everywhere and makes it easier to cache some results * makes it clear that those parameters are always selfUniqueAddress and selfDc, instead of some arbitary node/dc	2017-07-05 08:47:32 +02:00
Patrik Nordwall	bb9549263e	Rename team to data center, #23275	2017-07-04 17:11:21 +02:00
Johan Andrén	164387a89e	[WIP] one leader per cluster team (#23239 ) * Guarantee no sneaky type puts more teams in the role list * Leader per team and initial tests * MiMa filters * Second iteration (not working though) * Verbose gossip logging etc. * Gossip to team-nodes even if there is inter-team unreachability * More work ... * Marking removed nodes with tombstones in Gossip * More test coverage for Gossip.remove * Bug failing other multi-node tests squashed * Multi-node test for team-split * Review fixes - only prune tombstones on leader ticks * Clean code is happy code. * All I want is for MiMa to be my friend * These constants are internal * Making the formatting gods happy * I used the wrong reachability for ignoring gossip :/ * Still hadn't quite gotten how reachability was supposed to work * Review feedback applied * Cross-team downing should still work * Actually prune tombstones in the prune tombstones method ... * Another round against reachability. Reachability leading with 15 - 2 so far.	2017-07-04 10:09:40 +02:00
Patrik Nordwall	41c756f169	properly shutdown ArteryTransport using CoordinatedShutdown, #22671 (#22698 ) * properly shutdown ArteryTransport using CoordinatedShutdown, #22671 * The shutdownHook changed hasBeenShutdown flag to true, and then when the transport.shutdown was invoked the shutdown sequence was ignored until it was too late, ActorSystem already terminated. * Also improved the cluster shutdown tasks when the cluster node had not joined * CoordinatedShutdownLeave explicit events	2017-04-11 21:48:51 +02:00
Patrik Nordwall	452b3f1406	remove old deprecated cluster metrics, #21423 * corresponding was moved to akka-cluster-metrics, see http://doc.akka.io/docs/akka/2.4/project/migration-guide-2.3.x-2.4.x.html#New_Cluster_Metrics_Extension	2017-01-20 13:48:36 +01:00
Patrik Nordwall	84ade6fdc3	add CoordinatedShutdown, #21537 * CoordinatedShutdown that can run tasks for configured phases in order (DAG) * coordinate handover/shutdown of singleton with cluster exiting/shutdown * phase config obj with depends-on list * integrate graceful leaving of sharding in coordinated shutdown * add timeout and recover * add some missing artery ports to tests * leave via CoordinatedShutdown.run * optionally exit-jvm in last phase * run via jvm shutdown hook * send ExitingConfirmed to leader before shutdown of Exiting to not have to wait for failure detector to mark it as unreachable before removing * the unreachable signal is still kept as a safe guard if message is lost or leader dies * PhaseClusterExiting vs MemberExited in ClusterSingletonManager * terminate ActorSystem when cluster shutdown (via Down) * add more predefined and custom phases * reference documentation * migration guide * problem when the leader order was sys2, sys1, sys3, then sys3 could not perform it's duties and move Leving sys1 to Exiting because it was observing sys1 as unreachable * exclude Leaving with exitingConfirmed from convergence condidtion	2017-01-16 09:01:57 +01:00
Philippus Baalman	6c7085252a	extended copyright into 2017	2017-01-04 17:37:15 +01:00
Patrik Nordwall	446c0545ec	member accessor in ReachabilityEvent, #21944 (#21947 )	2016-12-05 12:07:18 +01:00
Björn Antonsson	c66ce62d63	Update to a working version of Scalariform	2016-06-02 22:12:36 +02:00
Johan Andrén	62e30b3c08	Update copyrights and links to the new company name #19851	2016-02-23 12:58:39 +01:00
Prayag Verma	b7783968a0	=pro #19068 All copyrights ranges and single years updated to a range ending in 2016	2016-01-25 10:20:30 +01:00
drewhk	48282fc753	Merge pull request #18729 from hseeberger/hseeberger-18575-publish-member-joined Publish MemberJoined	2015-11-11 11:23:04 +01:00
Patrik Nordwall	c7c187f6b7	=clu replace Set -- with diff and ++ with union * better performance according to https://docs.google.com/presentation/d/1Qjryxoe-fYEM8ZPhM-98LKfbhnRcn5eAEMNlVVnixsA/pub	2015-11-06 14:48:17 +01:00
Heiko Seeberger	821dc2199b	+act #18575 Publish MemberJoined	2015-10-21 17:30:28 +02:00
Patrik Nordwall	9380983d3c	=clu #18554 Make oldest assignment deterministic when joining * the reported issue is fixed by the immediate leaderActions (moving to Up) when joining the first node to itself * the other changes are precautions just in case	2015-10-21 07:53:14 +02:00
Veiga Ortiz, Héctor	c08bc317e2	+clu #13584 Accept joining to be WeaklyUp during network split * experimental feature, disabled by default * Adding documentation to mention weakly up members. plus adding new diagram.	2015-09-04 12:44:47 +02:00
Roland Kuhn	18688fc84b	= #17380 fix doc comments for java8 doclint * actor and cluster-metrics comments * agent/camel/cluster/osgi/persistence/remote comments * comments in contrib/persistence-tck/multi-node/typed	2015-05-18 12:51:36 +02:00
Patrik Nordwall	c991d5f1d1	=str #17200 Stop shard region when MemberRemoved Two issues: 1) ShardRegion actor must stop itself when the node is shutting down, ie. when receiving MemberRemoved(selfAddress) 2) ShardCoordinator must not persist anything when the node is shutting down. MemberRemoved of other shard regions will trigger Terminated, which must not be persisted, because then the next coordinator will replay those events and end up in wrong state. This is a problem announced itself when using leaving as illustrated in the new test. To solve the second issue I have added a new ClusterShuttingDown event that is published before the MemberRemoved events. Note that Terminated is triggered by MemberRemoved. (cherry picked from commit 1b272c72597beece9d93f0054f4b58e3d25f9ae2)	2015-04-22 12:46:30 +02:00
Patrik Nordwall	fe98dae650	=clu #13875 Fix regression in leader selection * The leader is selected by picking the first reachable member, but in #13875 we had to let the self member be unreachable in the Reachability table and that was not considered in the logic of the leader selection. * That means changed behavior that is unwanted, especially when there is only one node left the leader could be evaluated to None instead of Some(selfUniqueAddress). * Note that #13875 has not been released yet.	2015-03-14 11:41:28 -07:00
Julian Tescher	00f6a58e7c	Changes all occurances of Typesafe copyright to extend to 2015	2015-03-10 14:12:19 -07:00
Patrik Nordwall	71ccb4c21b	=clu #13875 Exclude unreachability observations from downed * Skip observations from downed node (quarantined is marked down immediately) in convergence check * Skip observations from downed node when picking "reachable" targets for gossip. * This also means that we must accept gossip with own node marked as unreachable, but that should not be spread to the external membership events.	2015-02-06 10:19:48 +01:00
Andrei Pozolotin	7b9f77a073	+ akka-cluster-metrics: new akka module * new akka module split from akka-cluster * provide sigar provisioning * fix ewma usage * resolve #16121 * see #16354	2015-01-19 10:23:54 -06:00
Patrik Nordwall	503c4ced8f	!clu #3920 Remove deprecated Cluster.publishCurrentClusterState	2014-03-14 14:11:28 +01:00
dario.rexin	2cbad298d6	=all #3858 Make case classes final	2014-03-07 13:20:01 +01:00
Adam Voss	cce29dfa51	Changes all occurances of Typesafe copyright to extend to 2014.	2014-02-04 21:20:09 -06:00
Patrik Nordwall	2e5193347e	!clu #3617 API improvements related to CurrentClusterState * Getter for CurrentClusterState in Cluster extension, updated via ClusterReadView * Remove lazy init of readView. Otherwise the cluster.state will be empty on first access, wich is probably surprising * Subscribe to several cluster event types at once, to ensure one CurrentClusterEvent followed by change events * Deprecate publishCurrentClusterState, was a bad idea, use sendCurrentClusterState instead * Possibility to subscribe with InitialStateAsEvents to receive events corresponding to CurrentClusterState * CurrentClusterState not a ClusterDomainEvent, ticket #3614	2014-01-16 16:17:44 +01:00
Patrik Nordwall	dc9fe4f19c	!clu #2307 Allow transition from unreachable to reachable * Replace unreachable Set with Reachability table * Unreachable members stay in member Set * Downing a live member was moved it to the unreachable Set, and then removed from there by the leader. That will not work when flipping back to reachable, so a Down member must be detected as unreachable before beeing removed. Similar to Exiting. Member shuts down itself if it sees itself as Down. * Flip back to reachable when failure detector monitors it as available again * ReachableMember event * Can't ignore gossip from aggregated unreachable (see SurviveNetworkInstabilitySpec) * Make use of ReachableMember event in cluster router * End heartbeat when acknowledged, EndHeartbeatAck * Remove nr-of-end-heartbeats from conf * Full reachability info in JMX cluster status * Don't use interval after unreachable for AccrualFailureDetector history * Add QuarantinedEvent to remoting, used for Reachability.Terminated * Prune reachability table when all reachable * Update documentation * Performance testing and optimizations	2013-09-11 13:10:29 +02:00
Patrik Nordwall	a323936299	Disable cluster stats by default, see #3348 * Add VectorClockStats	2013-05-28 16:15:57 +02:00
Patrik Nordwall	ee6e80d31a	Add previousStatus in MemberRemoved, see #3252	2013-05-23 11:09:32 +02:00
Patrik Nordwall	a0a0f39613	Hardening of cluster member leaving path, see #3309 * Removed leader commands for Shutdown and Exit * Member shutdown itself when it sees itself as Exiting * Singleton cluster with status Exiting will shutdown itself, in case the Exiting gossip never arrives * Exiting member not part convergence check * Exiting member is removed by leader (on convergence) when the exiting member is in the unreachable set, i.e. sucessfully shutdown * Reverted the change made for #3266, i.e. Exiting is detected as unreachable again. * Adjust ClusterSingletonManager to new Exiting behaviour * Fix bug in HeartbeatSender, which caused it to continue to send heartbeats to removed nodes, instead of rebalancing * Refactoring of leaderActions method * Leaving section in docs	2013-05-17 11:39:49 +02:00
Björn Antonsson	539df2e98a	Enforce mailbox types on System actors. See #3273	2013-05-03 11:05:32 +02:00
Patrik Nordwall	4606612bd1	Reliable remote supervision and death watch, see #2993 * RemoteWatcher that monitors node failures, with heartbeats and failure detector * Move RemoteDeploymentWatcher from CARP to RARP * ClusterRemoteWatcher that handles cluster nodes * Update documentation * UID in Heartbeat msg to be able to quarantine, actual implementation of quarantining will be implemented in ticket 2594	2013-04-17 19:42:51 +02:00
Patrik Nordwall	9e56ab6fe5	Disallow re-joining, see #2873 * Disallow join requests when already part of a cluster * Remove wipe state when joining, since join can only be performed from empty state * When trying to join, only accept gossip from that member * Ignore gossips from unknown (and unreachable) members * Make sure received gossip contains selfAddress * Test join of fresh node with same host:port * Remove JoinTwoClustersSpec * Welcome message as reply to Join * Retry unsucessful join request * AddressUidExtension * Uid in cluster Member identifier To be able to distinguish nodes with same host:port after restart. * Ignore gossip with wrong uid * Renamed Remove command to Shutdown * Use uid in vclock identifier * Update sample, Member apply is private * Disabled config duration syntax and cleanup of io settings * Update documentation	2013-04-17 16:48:18 +02:00
Björn Antonsson	73f0f44ddb	Protobuf serialization of cluster messages. See #1910	2013-04-11 10:09:05 +02:00
Patrik Nordwall	7eac88f372	Cluster node roles, see #3049 * Config of node roles cluster.role * Cluster router configurable with use-role * RoleLeaderChanged event * Cluster singleton per role * Cluster only starts once all required per-role node counts are reached, role.<role-name>.min-nr-of-members config * Update documentation and make use of the roles in the examples	2013-03-18 11:56:11 +01:00
Patrik Nordwall	1e4b2585c7	Publish LeaderChanged when first seen, see #3131 * The problem in ClusterSingletonManagerChaosSpec was that node 4 doesn't publish LeaderChanged, because there is never convergence on node 4 of the new Up state for the three new nodes before they are shutdown. When it becomes convergence on node 4 prevConvergedGossip and newGossip have same leader (i.e. no change). * LeaderChanged is now published when the new leader is first seen, i.e. same as member events. This makes sense now when leader can't be in Joining state.	2013-03-11 12:41:15 +01:00
Patrik Nordwall	5b844ec1e6	Publish member events when state change first seen, see #3075 * Remove InstantMemberEvent	2013-03-07 14:07:17 +01:00
Patrik Nordwall	5c7747e7fa	Transition from Down to Removed, see #3075	2013-03-07 14:02:42 +01:00
Roland	bcfbea42c1	fix formatting of Java API in doc comments + genjavadoc 0.3	2013-03-07 09:05:55 +01:00
Patrik Nordwall	cab78e5174	Make cluster fault handling more robust, see #3030 * ClusterCoreDaemon and ClusterDomainEventPublisher can't be restarted because the state would be obsolete. * Add extra supervisor level for ClusterCoreDaemon and ClusterDomainEventPublisher, which will shutdown the member on failure in children. * Publish the final removed state on postStop in ClusterDomainEventPublisher. This also simplifies the removing process.	2013-02-12 21:55:08 +01:00
Patrik Nordwall	d32a2edc51	Buffer LeaderChanged events and publish all on convergence, see #3017 * Otherwise some changes might never be published, since it doesn't have to be convergence on all nodes inbetween all transitions. * Detected by a failure ClusterSingletonManagerSpec. * Added a test to simulate the failure scenario.	2013-02-08 12:29:11 +01:00
Patrik Nordwall	79303a1785	Incorparate review comments, see #2803	2013-01-14 19:32:52 +01:00
Patrik Nordwall	d07f331e78	Publish InstantMemberEvent immediately, see #2803	2013-01-14 19:13:48 +01:00
Viktor Klang (√)	6b638db65e	Merge pull request #1006 from akka/wip-2879-copyright2013-√ #2879 - updating copyright info	2013-01-14 04:59:29 -08:00
Viktor Klang	adfeb2c1f0	#2879 - updating copyright info	2013-01-09 11:38:00 +01:00
Patrik Nordwall	943c438d5e	Publish clean state when joining (PublishStart), see #2871 * The failure in JoinTwoClustersSpec was due to missing publishing of cluster events when clearing current state when joining * This fix is in the right direction, but joining clusters like this will need some design thought, creating ticket 2873 for that	2013-01-08 19:32:36 +01:00
Patrik Nordwall	f147f4d3d2	Stress / long running test of cluster, see #2786 * akka.cluster.StressSpec * Configurable number of nodes and duration for each step * Report metrics and phi periodically to see progress * Configurable payload size * Test of various join and remove scenarios * Test of watch * Exercise supervision * Report cluster stats * Test with many actors in tree structure Apart from the test this commit also solves some issues: * Avoid adding back members when downed in ClusterHeartbeatSender * Avoid duplicate close of ClusterReadView * Add back the publish of AddressTerminated when MemberDowned/Removed it was lost in merge of "publish on convergence", see #2779	2013-01-07 14:44:36 +01:00

1 2

69 commits