Commit graph

1272 commits

Author SHA1 Message Date
Devis Lucato
b89008bdaf Fix "attmpts" typo 2017-03-01 12:44:32 +01:00
Martynas Mickevičius
1754625202 #22353 fix mbean expected json format 2017-02-21 13:05:36 +02:00
Richard Imaoka
6936c09e4e Fix JSON formatting of the jmx-cluster/akka-cluster tool #21250 2017-02-20 14:55:43 +01:00
Johan Andrén
52a20f2ba9 Micro kernel module removed #22205 2017-01-26 15:40:54 +01:00
Patrik Nordwall
4703e30774 disable weakly-up for some tests 2017-01-25 07:20:24 +01:00
Patrik Nordwall
94e40460a4 Merge pull request #22206 from akka/wip-21423-remove-deprecations-patriknw
remove deprecations, #21423
2017-01-24 16:45:31 +01:00
Patrik Nordwall
db74c33130 remove deprecated constructor in serializers, #21423 2017-01-24 13:34:05 +01:00
Patrik Nordwall
1700cdaebc Promote WeaklyUp and enable by default, #22197 2017-01-24 12:31:32 +01:00
Patrik Nordwall
af142f82fd change router type in cluster.StressSpec
* it was an oversight when old cluster metrics was removed
2017-01-23 21:18:25 +01:00
Patrik Nordwall
452b3f1406 remove old deprecated cluster metrics, #21423
* corresponding was moved to akka-cluster-metrics, see
  http://doc.akka.io/docs/akka/2.4/project/migration-guide-2.3.x-2.4.x.html#New_Cluster_Metrics_Extension
2017-01-20 13:48:36 +01:00
Patrik Nordwall
6c8a69109a Merge pull request #22138 from VEINHORN/master
Remove unnecessary new keywords
2017-01-17 19:31:45 +01:00
Patrik Nordwall
84ade6fdc3 add CoordinatedShutdown, #21537
* CoordinatedShutdown that can run tasks for configured phases in order (DAG)
* coordinate handover/shutdown of singleton with cluster exiting/shutdown
* phase config obj with depends-on list
* integrate graceful leaving of sharding in coordinated shutdown
* add timeout and recover
* add some missing artery ports to tests
* leave via CoordinatedShutdown.run
* optionally exit-jvm in last phase
* run via jvm shutdown hook
* send ExitingConfirmed to leader before shutdown of Exiting
  to not have to wait for failure detector to mark it as
  unreachable before removing
* the unreachable signal is still kept as a safe guard if
  message is lost or leader dies
* PhaseClusterExiting vs MemberExited in ClusterSingletonManager
* terminate ActorSystem when cluster shutdown (via Down)
* add more predefined and custom phases
* reference documentation
* migration guide
* problem when the leader order was sys2, sys1, sys3,
  then sys3 could not perform it's duties and move Leving sys1 to
  Exiting because it was observing sys1 as unreachable
* exclude Leaving with exitingConfirmed from convergence condidtion
2017-01-16 09:01:57 +01:00
VEINHORN
0eac4d413b removed unnecessary new keywords 2017-01-13 12:35:05 +03:00
Patrik Nordwall
180361868c Merge pull request #22054 from akka/wip-22053-log-join-retry-patriknw
log join retries, #22053
2017-01-09 14:18:40 +01:00
Philippus Baalman
6c7085252a extended copyright into 2017 2017-01-04 17:37:15 +01:00
Patrik Nordwall
645ae4cb31 log join retries, #22053 2016-12-21 16:15:56 +01:00
Patrik Nordwall
e494ec2183 catch NotSerializableException from deserialization, #20641
* to be able to introduce new messages and still support rolling upgrades,
  i.e. a cluster of mixed versions
* note that it's only catching NotSerializableException, which we already
  use for unknown serializer ids and class manifests
* note that it is not catching for system messages, since that could result
  in infinite resending
2016-12-16 20:14:37 +01:00
Patrik Nordwall
1a12e950ff Reachability.remove didn't always remove all, #22012
* the versions table in Reachability was not cleared
  if the records for removed node had been pruned, i.e.
  all reachable again
2016-12-16 12:25:37 +01:00
Patrik Nordwall
f6a1fba824 =clu don't use Down member as leader, #21906 (#21990)
* in the failed test it was noticed that a Down member removed
  itself in leaderActionsOnConvergence which resulted in
  later "Failed to serialize Gossip, Unknown address"
* never use member with status Down as leader
* a node will anyway shutdown itself when it's Down,
  but leader actions could happen before that
2016-12-13 10:53:39 +01:00
Patrik Nordwall
dce668771e fix shutdown of pending StressSpec, #21960 (#21963) 2016-12-07 15:38:11 +01:00
Patrik Nordwall
2ef6457311 enable NodeChurnSpec, #21483
* Verify that it actually fails with classic remoting
  if vector clocks are not pruned
* Make it pass with Artery, but it is not verifying
  the message sizes yet. We should implement that
  with a custom RemoteInstrument, but that can be done
  in separate PR.
* Still pending with Artery because it still fails on jenkins
* barrier after sys shutdown

(cherry picked from commit d5edcbea35ca5b43ca4cfb3018602dd555402f42)
2016-12-05 14:27:12 +01:00
Patrik Nordwall
446c0545ec member accessor in ReachabilityEvent, #21944 (#21947) 2016-12-05 12:07:18 +01:00
Patrik Nordwall
e04444567f Speedup pull request validation
* speedup ActorCreationPerfSpec
* reduce iterations in ConsistencySpec
* tag SupervisorHierarchySpec as LongRunningTest
* various small speedups and tagging in actor-tests
* speedup expectNoMsg in stream-tests
* tag FramingSpec, and reduce iterations
* speedup QueueSourceSpec
* tag some stream-tests
* reduce iterations in persistence.PerformanceSpec
* reduce iterations in some cluster perf tests
* tag RemoteWatcherSpec
* tag InterpreterStressSpec
* remove LongRunning from ClusterConsistentHashingRouterSpec
* sys property to disable multi-jvm tests in test
* actually disable multi-node tests in validatePullRequest
* doc sbt flags in CONTRIBUTING
2016-11-30 14:31:06 +01:00
Johan Andrén
2679be5ae4 Disable serialization warnings in akka test suites #21882 2016-11-23 12:02:36 +01:00
Patrik Nordwall
e101fe1232 Merge pull request #21869 from akka/wip-21810-pending-patriknw
mark StressSpec pending for Artery until we fix it, #21810
2016-11-18 15:44:49 +01:00
Patrik Nordwall
cc170df4d2 mark StressSpec pending for Artery until we fix it, #21810 2016-11-18 13:06:33 +01:00
Patrik Nordwall
68383b5001 harden cluster leaving, #21847
As documented in the code:

// Leader is moving itself from Leaving to Exiting. Let others know (best effort)
// before shutdown. Otherwise they will not see the Exiting state change
// and there will not be convergence until they have detected this node as
// unreachable and the required downing has finished. They will still need to detect
// unreachable, but Exiting unreachable will be removed without downing, i.e.
// normally the leaving of a leader will be graceful without the need
// for downing. However, if those final gossip messages never arrive it is
// alright to require the downing, because that is probably caused by a
// network failure anyway.

That is fine, but this change improves the selection of the nodes to
send the final gossip messages to.

I could reproduce the failure in ClusterSingletonManagerLeaveSpec and with
additional logging I verified that in the failure case it picked the "first"
node 3 times (it's random) and that node had already been shutdown (left earlier
in the test) but was not removed yet.
2016-11-18 12:33:42 +01:00
Patrik Nordwall
136e64b253 use longUid in ClusterRemoteWatcher, #21594
* found by test failure in SurviveNetworkInstabilitySpec
2016-09-30 10:51:51 +02:00
Johan Andrén
0f376e751e Quarantine gracefully downed node after some time (#21534)
* New setting for quarantining after graceful leave
2016-09-28 14:04:58 +02:00
Patrik Nordwall
86d912a299 Merge pull request #21555 from akka/wip-21522-StressSpec-patriknw
increase acceptable-heartbeat-pause in StressSpec, #21522
2016-09-26 19:21:07 +02:00
Johan Andrén
8ae0c9a888 Use long uid in artery remoting and cluster #20644 2016-09-26 15:34:59 +02:00
Patrik Nordwall
d91ddb7891 increase acceptable-heartbeat-pause in StressSpec, #21522 2016-09-23 15:50:32 +02:00
Patrik Nordwall
63917c1947 Merge pull request #21513 from akka/wip-21512-quick-restart-patriknw
fix problem with quick restart, #21512
2016-09-22 18:33:22 +02:00
Patrik Nordwall
9f175f56de fix problem with quick restart, #21512
* image-liveness-timeout must be less than the handshake-timeout,
  otherwise the publication for the handshake will give up too early
  when previous image is still considered alive
2016-09-21 20:27:04 +02:00
Patrik Nordwall
f1590a59b4 revert quarantine removed (leaving) cluster member, #21509 2016-09-21 17:27:34 +02:00
Patrik Nordwall
1926560e41 stop outbound streams when quarantined, #21407
* they can't be stopped immediately because we want to send
  some final message and we reply to inbound messages with `Quarantined`
* and improve logging
2016-09-21 14:38:13 +02:00
Endre Sándor Varga
8ecd7419ac #21419: Reenable ClusterDeathWatchSpec 2016-09-19 12:48:07 +02:00
Johan Andrén
392ca5ecce Enable flight recorder in tests #21205
* Setting to configure where the flight recorder puts its file
* Run ArteryMultiNodeSpecs with flight recorder enabled
* More cleanup in exit hook, wait for task runner to stop
* Enable flight recorder for the cluster multi node tests
* Enable flight recorder for multi node remoting tests
* Toggle always-dump flight recorder output when akka.remote.artery.always-dump-flight-recorder is set
2016-09-16 15:12:40 +02:00
Patrik Nordwall
835125de3d make cluster.StressSpec pass with Artery, #21458
* need to use a shared media driver to get the cpu usage
  at a reasonable level
* also changed to SleepingIdleStrategy(1 ms) when cpu-level=1
  not needed for the test to pass, but can be good to make level 1
  more extreme
2016-09-16 12:58:41 +02:00
Patrik Nordwall
03eb20e5d2 Merge pull request #21461 from johanandren/wip-more-tests-working-with-artery-johanandren
More tests working on artery
2016-09-14 16:06:01 +02:00
Johan Andrén
848d56cc2f More tests working on artery
* non-multi-jvm tests from akka-cluster
* akka-cluster-metrics
* akka-cluster-tools
* akka-cluster-sharding
2016-09-14 11:40:42 +02:00
Patrik Nordwall
bf151e9793 don't quarantine back, #21450
* Don't quarantine the other system when receiving the Quarantined message,
  since that will result cluster member removal and can result in
  forming two separate clusters (cluster split).
* Instead, the downing strategy should act on ThisActorSystemQuarantinedEvent, e.g.
  use it as a STONITH signal.
2016-09-13 08:01:58 +02:00
Johan Andrén
3502f0d72f One more missed canonical.port in cluster tests (#21428) 2016-09-09 18:12:35 +02:00
Johan Andrén
b0e03058b9 Port and hostname config path was changed, cluster tests didn't get the change (#21427) 2016-09-09 17:55:02 +02:00
Johan Andrén
fa1d6d6f19 Disable ClusterDeathWatchSpec for now (#21421) 2016-09-09 17:54:13 +02:00
Patrik Nordwall
e8ce261faf Merge branch 'master' into wip-sync-2.4.10-patriknw 2016-09-09 14:12:16 +02:00
Patrik Nordwall
3b7a7dfa59 add reason param to quarantine method 2016-09-08 18:00:37 +02:00
Johan Andrén
90193907fe Make cluster tests run with artery #21204 2016-09-07 16:41:03 +02:00
Patrik Nordwall
0a75f992e4 Update links to Lightbend RPv2, more warnings about auto-down 2016-09-02 10:26:47 +02:00
Patrik Nordwall
8ab02738b7 Merge branch 'master' into wip-sync-artery-dev-2.4.9-patriknw 2016-08-23 20:14:15 +02:00