Commit graph

173 commits

Author SHA1 Message Date
Jimin Hsieh
2c2b8ba001 Remove some of Unused import warning (#24650) 2018-03-16 12:08:29 +01:00
Konrad `ktoso` Malawski
563c7fbcf0 Issue 24594: Integration with sbt-headers and initial header population 2018-03-13 15:45:55 +01:00
Renato Cavalcanti
c83e4adfea Rolling update config checker, #24009
* adds config compatibility check
* doc'ed what happens when joining a cluster not supporting this feature
* added extra docs over sensitive paths
2018-02-20 15:47:09 +01:00
Patrik Nordwall
2733a26540 Remove Exiting/Down node from other DC, #24171
* When leaving/downing the last node in a DC it would not
  be removed in another DC, since that was only done by the
  leader in the owning DC (and that is gone).
* It should be ok to eagerly remove such nodes also by
  leaders in other DCs.
* Note that gossip is already sent out so for the last node
  that will be spread to other DC, unless there is a network
  partition. For that we can't do anything. It will be replaced
  if joining again.
2018-01-16 07:55:49 +01:00
Christopher Batey
009214ae07
Update copyright to 2018 (#24241) 2018-01-04 17:26:29 +00:00
Pritam Kadam
37f0da17b7 Allow member to leave a cluster via CoordinatedShutdown.run when MemberStatus is Joining/WeaklyUp/Up. (#24152) 2018-01-04 07:43:25 +00:00
Patrik Nordwall
e49acb7daa add Reason to CoordinatedShutdown, #24048 2017-12-04 14:16:06 +01:00
Patrik Nordwall
fa3da328be Run all CoordinatedShutdown phases also when downing, #24048 2017-12-04 11:05:22 +01:00
Patrik Nordwall
95e0ac43e9 small perf improvement of isGossipSpeedupNeeded for single-dc 2017-11-02 18:27:50 +01:00
Christopher Batey
5a37cdc862 Cross DC gossip fixes #23803
* Adjust cross DC gossip probability for small nr of nodes in a DC
When a Dc is being bootstrapped the initial node has no local peers and
can not gossip if it selects a local gossip round. Start at a
probability of 1.0 for a single node cluster and move down 0.25 per node
until a 5 node DC is reached then use the cross-data-center-gossip-probability
* Fix cross DC gossip selecting of oldest members
This used to select the members based on the sort order members in
Gossip (by address) rather than by upNumber
2017-11-02 09:17:24 +01:00
Patrik Nordwall
86712d5b40 fix confusing logging when receiving gossip from unknown 2017-10-31 14:05:51 +01:00
Patrik Nordwall
5fc6d5a04a Verify removal and add of new node incarnation in multi-dc, #23585
* MemberRemoved must be published before MemberUp, e.g. when restarted
  in other DC
* remove from failureDetector when receiving gossip with new member,
  not only new joining member

* increase timeout in MultiDcSingletonManagerSpec
2017-09-25 16:47:06 +02:00
Patrik Nordwall
4f8856f108 Merge pull request #23551 from akka/wip-23502-join-timeout-patriknw
Add timeout to abort joining of seed nodes, #23502
2017-09-11 16:41:35 +02:00
Patrik Nordwall
5cf698a2f6 Add timeout to abort joining of seed nodes, #23502 2017-09-11 15:56:25 +02:00
Patrik Nordwall
cb08535e7d use right youngest when moving to Up, #23582
* also confirm TakeOverFromMe when singleton already in oldest state
2017-09-04 16:02:23 +02:00
Patrik Nordwall
1e4e7cbba2 Merge pull request #23583 from akka/wip-multi-dc-merge-master-patriknw
merge wip-multi-dc-dev back to master
2017-09-01 17:08:28 +02:00
Patrik Nordwall
e3aada5016 Connect the dots for cross-dc reachability, #23377
* the crossDcFailureDetector was not connected to the reachability table
* additional test by listen for {Reachable/Unreachable}DataCenter events in split spec
* missing Java API for getUnreachableDataCenters in CurrentClusterState
2017-08-22 15:05:40 +02:00
Patrik Nordwall
6753c1e624 Don't use WeaklyUp immediately, #23554
* see description in issue
2017-08-22 12:02:04 +02:00
Johan Andrén
9c7e8d027a Renamed/moved the self data center setting #23312 (#23344) 2017-07-12 11:47:32 +01:00
Johan Andrén
a15e459922 Merging did not prune vector clocks for tombstoned nodes #23318 2017-07-10 13:01:06 +01:00
Johan Andrén
c0d439eac3 limit cross dc gossip #23282 2017-07-07 13:19:10 +01:00
Konrad `ktoso` Malawski
b568975acc =clu #23229 multi-dc heartbeating, only N nodes perform monitoring 2017-07-07 12:17:41 +01:00
Patrik Nordwall
867cc97bdd Refactoring of Gossip class, #23290
* move methods that depends on selfUniqueAddress and selfDc
  to a separate MembershipState class, which also holds the
  latest gossip
* this removes the need to pass in the parameters from everywhere and
  makes it easier to cache some results
* makes it clear that those parameters are always selfUniqueAddress
  and selfDc, instead of some arbitary node/dc
2017-07-05 08:47:32 +02:00
Patrik Nordwall
bb9549263e Rename team to data center, #23275 2017-07-04 17:11:21 +02:00
Johan Andrén
164387a89e [WIP] one leader per cluster team (#23239)
* Guarantee no sneaky type puts more teams in the role list

* Leader per team and initial tests

* MiMa filters

* Second iteration (not working though)

* Verbose gossip logging etc.

* Gossip to team-nodes even if there is inter-team unreachability

* More work ...

* Marking removed nodes with tombstones in Gossip

* More test coverage for Gossip.remove

* Bug failing other multi-node tests squashed

* Multi-node test for team-split

* Review fixes - only prune tombstones on leader ticks

* Clean code is happy code.

* All I want is for MiMa to be my friend

* These constants are internal

* Making the formatting gods happy

* I used the wrong reachability for ignoring gossip :/

* Still hadn't quite gotten how reachability was supposed to work

* Review feedback applied

* Cross-team downing should still work

* Actually prune tombstones in the prune tombstones method ...

* Another round against reachability. Reachability leading with 15 - 2 so far.
2017-07-04 10:09:40 +02:00
Nafer Sanabria
ef76af7add =cls add logging info on seed node joining (#22724)
* =cls add logging info on seed node joining

* adjust message
2017-05-19 14:20:29 +02:00
Patrik Nordwall
41c756f169 properly shutdown ArteryTransport using CoordinatedShutdown, #22671 (#22698)
* properly shutdown ArteryTransport using CoordinatedShutdown, #22671

* The shutdownHook changed hasBeenShutdown flag to true, and then when
  the transport.shutdown was invoked the shutdown sequence was ignored
  until it was too late, ActorSystem already terminated.
* Also improved the cluster shutdown tasks when the cluster node had not
  joined

* CoordinatedShutdownLeave explicit events
2017-04-11 21:48:51 +02:00
Devis Lucato
b89008bdaf Fix "attmpts" typo 2017-03-01 12:44:32 +01:00
Patrik Nordwall
452b3f1406 remove old deprecated cluster metrics, #21423
* corresponding was moved to akka-cluster-metrics, see
  http://doc.akka.io/docs/akka/2.4/project/migration-guide-2.3.x-2.4.x.html#New_Cluster_Metrics_Extension
2017-01-20 13:48:36 +01:00
Patrik Nordwall
84ade6fdc3 add CoordinatedShutdown, #21537
* CoordinatedShutdown that can run tasks for configured phases in order (DAG)
* coordinate handover/shutdown of singleton with cluster exiting/shutdown
* phase config obj with depends-on list
* integrate graceful leaving of sharding in coordinated shutdown
* add timeout and recover
* add some missing artery ports to tests
* leave via CoordinatedShutdown.run
* optionally exit-jvm in last phase
* run via jvm shutdown hook
* send ExitingConfirmed to leader before shutdown of Exiting
  to not have to wait for failure detector to mark it as
  unreachable before removing
* the unreachable signal is still kept as a safe guard if
  message is lost or leader dies
* PhaseClusterExiting vs MemberExited in ClusterSingletonManager
* terminate ActorSystem when cluster shutdown (via Down)
* add more predefined and custom phases
* reference documentation
* migration guide
* problem when the leader order was sys2, sys1, sys3,
  then sys3 could not perform it's duties and move Leving sys1 to
  Exiting because it was observing sys1 as unreachable
* exclude Leaving with exitingConfirmed from convergence condidtion
2017-01-16 09:01:57 +01:00
Patrik Nordwall
180361868c Merge pull request #22054 from akka/wip-22053-log-join-retry-patriknw
log join retries, #22053
2017-01-09 14:18:40 +01:00
Philippus Baalman
6c7085252a extended copyright into 2017 2017-01-04 17:37:15 +01:00
Patrik Nordwall
645ae4cb31 log join retries, #22053 2016-12-21 16:15:56 +01:00
Patrik Nordwall
68383b5001 harden cluster leaving, #21847
As documented in the code:

// Leader is moving itself from Leaving to Exiting. Let others know (best effort)
// before shutdown. Otherwise they will not see the Exiting state change
// and there will not be convergence until they have detected this node as
// unreachable and the required downing has finished. They will still need to detect
// unreachable, but Exiting unreachable will be removed without downing, i.e.
// normally the leaving of a leader will be graceful without the need
// for downing. However, if those final gossip messages never arrive it is
// alright to require the downing, because that is probably caused by a
// network failure anyway.

That is fine, but this change improves the selection of the nodes to
send the final gossip messages to.

I could reproduce the failure in ClusterSingletonManagerLeaveSpec and with
additional logging I verified that in the failure case it picked the "first"
node 3 times (it's random) and that node had already been shutdown (left earlier
in the test) but was not removed yet.
2016-11-18 12:33:42 +01:00
Johan Andrén
8ae0c9a888 Use long uid in artery remoting and cluster #20644 2016-09-26 15:34:59 +02:00
Endre Sándor Varga
5e830323f6 Updating to ScalaTest 3.0.0 and ScalaCheck 1.13.2 2016-08-22 11:13:49 +02:00
Patrik Nordwall
0c4d4c37ba cluster singleton improvements, #20942
* track nodes by UniqueAddress in Cluster Singleton, #20942
* reply with HandOverDone from new incarnation, #20942
* confirm as terminated immediately when new incarnation joins, #20942 instead of waiting for failure detector to mark it as unreachable this will speed-up removal when restarting cluster node with same hostname:port
2016-08-19 11:56:55 +02:00
Patrik Nordwall
d731f20bf1 suppress deadletter for the cluster joining messages 2016-08-09 17:22:31 +02:00
Björn Antonsson
c66ce62d63 Update to a working version of Scalariform 2016-06-02 22:12:36 +02:00
Yegor Andreenko
c66e3a9f02 =clu #20613 logging selfRoles during node unreachable and quarantined (#20542) 2016-05-24 14:35:50 +02:00
Johan Andrén
5671927cf1 clu #20309 API for pluggable cluster downing 2016-04-18 15:06:05 +02:00
adebski
472d404bbe =clu #19859 Relaxed constraints on downing old incarnation of rejoining node.
* Automatic downing of old node incarnation when new tries to rejoin the cluster is performed even if old incarnation was left in Leaving or Exiting state.
* Added information to clustering docs about automatic downing of old incarnations when new tries to rejoin the cluster.
2016-02-26 20:35:19 +01:00
Johannes Rudolph
b6cbc7f13a =all remove unused imports 2016-02-23 20:29:22 +01:00
Johan Andrén
62e30b3c08 Update copyrights and links to the new company name #19851 2016-02-23 12:58:39 +01:00
Prayag Verma
b7783968a0 =pro #19068 All copyrights ranges and single years updated to a range ending in 2016 2016-01-25 10:20:30 +01:00
Roland Kuhn
f1abaa1c5e Merge pull request #18875 from ktoso/wip-akka.js-cherries-ktoso
Akka.js cherries to master
2015-11-07 18:01:24 +01:00
Patrik Nordwall
c7c187f6b7 =clu replace Set -- with diff and ++ with union
* better performance according to
  https://docs.google.com/presentation/d/1Qjryxoe-fYEM8ZPhM-98LKfbhnRcn5eAEMNlVVnixsA/pub
2015-11-06 14:48:17 +01:00
Andrea
cd3d68a77c =act switch to java std lib ThreadLocalRandom 2015-11-06 14:04:33 +01:00
Patrik Nordwall
9380983d3c =clu #18554 Make oldest assignment deterministic when joining
* the reported issue is fixed by the immediate leaderActions
  (moving to Up)  when joining the first node to itself
* the other changes are precautions just in case
2015-10-21 07:53:14 +02:00
Veiga Ortiz, Héctor
c08bc317e2 +clu #13584 Accept joining to be WeaklyUp during network split
* experimental feature, disabled by default
* Adding documentation to mention weakly up members.
  plus adding new diagram.
2015-09-04 12:44:47 +02:00