Commit graph

1356 commits

Author SHA1 Message Date
Patrik Nordwall
0ea8c0d872
Merge pull request #24592 from akka/wip-24576-LargeMessageClusterSpec-patriknw
slowdown LargeMessageClusterSpec for tcp transport, #24576
2018-03-05 16:20:46 +01:00
Patrik Nordwall
1c8a2945ab slowdown LargeMessageClusterSpec for tcp transport, #24576 2018-03-05 15:19:12 +01:00
Johan Andrén
b7cc50cdd6
2.5.10 wire protocol regression (#24625) 2018-02-28 09:46:37 +01:00
Patrik Nordwall
5e80bd97f2 Stop unused Artery outbound streams, #23967
* fix memory leak in SystemMessageDelivery
* initial set of tests for idle outbound associations, credit to mboogerd
* close inbound compression when quarantined, #23967
  * make sure compressions for quarantined are removed in case they are lingering around
  * also means that advertise will not be done for quarantined
  * remove tombstone in InboundCompressions
* simplify async callbacks by using invokeWithFeedback
* compression for old incarnation, #24400
  * it was fixed by the other previous changes
  * also confirmed by running the SimpleClusterApp with TCP
    as described in the ticket
* test with tcp and tls-tcp transport
  * handle the stop signals differently for tcp transport because they
    are converted to StreamTcpException
* cancel timers on shutdown
* share the top-level FR for all Association instances
* use linked queue for control and large streams, less memory usage
* remove quarantined idle Association completely after a configured delay
  * note that shallow Association instances may still lingering in the
    heap because of cached references from RemoteActorRef, which may
    be cached by LruBoundedCache (used by resolve actor ref).
    Those are small, since the queues have been removed, and the cache
    is bounded.
2018-02-21 11:59:18 +01:00
Patrik Nordwall
0d222906f4 Prepare Artery for alternative TCP transport, #24390
* Refactoring to separate the Aeron specific things, ArteryAeronUdpTransport
* move Aeron specific classes to akka.remote.artery.aeron package
* move Version to ArterySettings, and describe strategy for envelope header changes
2018-02-20 16:02:57 +01:00
Renato Cavalcanti
c83e4adfea Rolling update config checker, #24009
* adds config compatibility check
* doc'ed what happens when joining a cluster not supporting this feature
* added extra docs over sensitive paths
2018-02-20 15:47:09 +01:00
Patrik Nordwall
570060815b
Merge pull request #24443 from akka/issue-24144
MultiDcSplitBain: only subscribe to unreachable after split
2018-02-02 15:38:53 +01:00
Patrik Nordwall
23fa8b0810 change spelling of behaviour to behavior, #24457 2018-02-01 15:10:46 +01:00
Christopher Batey
5658d6e77a MultiDcSplitBain: only subscribe to unreachable after split
Test would fail picking up the reachable from the previous unsplit
as it is a new probe.

Also change barrierCounter to split/unsplit so easier to see
where the failure is on a barrier fail
2018-01-30 09:01:15 +00:00
Patrik Nordwall
5cab621e82
Merge pull request #24078 from akka/wip-24055-heartbeat-patriknw
attempt to reproduce heartbeat issue, #24055
2018-01-16 19:10:10 +01:00
Patrik Nordwall
2733a26540 Remove Exiting/Down node from other DC, #24171
* When leaving/downing the last node in a DC it would not
  be removed in another DC, since that was only done by the
  leader in the owning DC (and that is gone).
* It should be ok to eagerly remove such nodes also by
  leaders in other DCs.
* Note that gossip is already sent out so for the last node
  that will be spread to other DC, unless there is a network
  partition. For that we can't do anything. It will be replaced
  if joining again.
2018-01-16 07:55:49 +01:00
Patrik Nordwall
971049c3bb Test that large messages don't disturb cluster heartbeats, #24055 2018-01-15 10:31:15 +01:00
Christopher Batey
0380cc517a Cluster singleton manager: don't send member events to FSM during shutdown (#24236)
There exists a race where a cluter node that is being downed seens its
self as the oldest node (as it has had the other nodes removed) and it
takes over the singleton manager sending the real oldest node to go into
the End state meaning that cluster singletons never work again.

This fix simply prevents Member events being given to the Cluster
Manager FSM during a shut down, instread relying on SelfExiting.

This also hardens the test by not downing the node that the current
sharding coordinator is running on as well as fixing a bug in the
probes.
2018-01-05 09:47:43 +01:00
Christopher Batey
009214ae07
Update copyright to 2018 (#24241) 2018-01-04 17:26:29 +00:00
Pritam Kadam
37f0da17b7 Allow member to leave a cluster via CoordinatedShutdown.run when MemberStatus is Joining/WeaklyUp/Up. (#24152) 2018-01-04 07:43:25 +00:00
Christopher Batey
3bd05ce67e MultiDcSplitBrainSpec: Turn on gossip loggig; Increase gossip frequency (#24024)
The last time this failed there was no gossip to or from a node that
didn't see fifth coming back.

Also note that this test doesn't quite test what it says as the split
brain is repaired before starting the second actor system but without
extensions to the multi jvm test kit this can't be improved.

Refs #23306
2017-12-14 22:26:27 +01:00
Johan Andrén
be3766d0ae
Post 2.5.8 fixes (#24128)
* Update MiMa latest release
* Silence some noise from sbt breaking the relase script
* MiMa excludes we had missed for a couple of releases
2017-12-08 16:53:47 +01:00
Patrik Nordwall
52f30a8043 ClusterSpec, race between MemberRemoved and MemberExited, #23449 (#24105) 2017-12-05 23:12:19 +09:00
Patrik Nordwall
e49acb7daa add Reason to CoordinatedShutdown, #24048 2017-12-04 14:16:06 +01:00
Patrik Nordwall
fa3da328be Run all CoordinatedShutdown phases also when downing, #24048 2017-12-04 11:05:22 +01:00
Patrik Nordwall
1cdd205c02
Merge pull request #23882 from chbatey/issue-23775-multidc-split
Increase time for MultiDcSplitBrain and increase cross DC gossip prob
2017-11-13 15:26:10 +01:00
Christopher Batey
4d3a7e93a6 Increase timeout and remove sleep
The test has been failing infrequently as when we get to the final
barrier (restarted-fifth-removed) the whole test withIn of 40s
has been reached so the last barrier times out right away.

Trying to remove the Thread.sleep and rely on a larger timeout for the
whole test as well as the default barrier timeout of 30s.
2017-11-13 12:20:56 +00:00
Patrik Nordwall
436668687a Move coordinated-shutdown config from test/resources, #23879
* looks like the ActorSystem is shutdown when leaving
* Included in MultiNodeSpec, i.e. all multi-node tests:
  akka.coordinated-shutdown.terminate-actor-system = off
  akka.oordinated-shutdown.run-by-jvm-shutdown-hook = off
2017-11-07 15:38:35 +01:00
Patrik Nordwall
95e0ac43e9 small perf improvement of isGossipSpeedupNeeded for single-dc 2017-11-02 18:27:50 +01:00
Christopher Batey
5a37cdc862 Cross DC gossip fixes #23803
* Adjust cross DC gossip probability for small nr of nodes in a DC
When a Dc is being bootstrapped the initial node has no local peers and
can not gossip if it selects a local gossip round. Start at a
probability of 1.0 for a single node cluster and move down 0.25 per node
until a 5 node DC is reached then use the cross-data-center-gossip-probability
* Fix cross DC gossip selecting of oldest members
This used to select the members based on the sort order members in
Gossip (by address) rather than by upNumber
2017-11-02 09:17:24 +01:00
Christopher Batey
511180ef39 Stop actor system from shutting down on Cluster.leave (#23872)
This then sets a race bewtween the rest of the test running as once the
ActorSystem shuts down test test coordinator won't for for barriers etc.
2017-10-31 19:02:28 +01:00
Patrik Nordwall
86712d5b40 fix confusing logging when receiving gossip from unknown 2017-10-31 14:05:51 +01:00
Martynas Mickevičius
82ca8a2cc7 Port build to SBT 1.x (#23850)
* Port build to SBT 1.x

* Fix multinode tests, always enable genjavadoc bootstrap
2017-10-30 10:13:13 +09:00
Arnout Engelen
9cb5849188 Accept 'Join' messages from nodes without dc (#23822)
* Accept 'Join' messages from nodes without dc

To allow a join from a 2.4 node to a 2.5.6 cluster.

* Use "ClusterSettings.DefaultDataCenter" constant
2017-10-23 04:49:51 -05:00
Arnout Engelen
b1df13d4d4 Update scalariform (#23778) (#23783) 2017-10-06 10:30:28 +02:00
Patrik Nordwall
5fc6d5a04a Verify removal and add of new node incarnation in multi-dc, #23585
* MemberRemoved must be published before MemberUp, e.g. when restarted
  in other DC
* remove from failureDetector when receiving gossip with new member,
  not only new joining member

* increase timeout in MultiDcSingletonManagerSpec
2017-09-25 16:47:06 +02:00
Patrik Nordwall
12196d674e enforce same DC for isOlderThan, #23307 (#23625) 2017-09-25 11:50:28 +02:00
Johan Andrén
c31f6b862f cluster apis for typed, #21226
* Cluster management (join, leave, etc)
* Cluster membership subscriptions (MemberUp, MemberRemoved, etc)
* New SelfUp and SelfRemoved events
* change signature of awaitAssert to return the value (not binary compatible)
* Cluster singleton api
2017-09-21 17:58:29 +02:00
Patrik Nordwall
4f8856f108 Merge pull request #23551 from akka/wip-23502-join-timeout-patriknw
Add timeout to abort joining of seed nodes, #23502
2017-09-11 16:41:35 +02:00
Patrik Nordwall
5cf698a2f6 Add timeout to abort joining of seed nodes, #23502 2017-09-11 15:56:25 +02:00
Patrik Nordwall
cb08535e7d use right youngest when moving to Up, #23582
* also confirm TakeOverFromMe when singleton already in oldest state
2017-09-04 16:02:23 +02:00
Patrik Nordwall
1e4e7cbba2 Merge pull request #23583 from akka/wip-multi-dc-merge-master-patriknw
merge wip-multi-dc-dev back to master
2017-09-01 17:08:28 +02:00
Patrik Nordwall
0ed5bc1835 add mima filters 2017-08-31 11:29:49 +02:00
Patrik Nordwall
6ed3295acd Merge branch 'master' into wip-multi-dc-merge-master-patriknw 2017-08-31 10:51:12 +02:00
Patrik Nordwall
6bfb7c9262 increase timeout in MultiDcSplitBrainSpec
* due to handshake timeout

reduce handshake timeout

fourth might generate UnreachableDataCenter in unsplit

MultiDcClusterSharding
2017-08-31 10:26:23 +02:00
Patrik Nordwall
dc75c4f818 Merge pull request #23531 from akka/wip-23369-NodeChurnSpec-patriknw
fix NodeChurnSpec tombstones, #23369
2017-08-28 09:17:32 +02:00
Patrik Nordwall
e3aada5016 Connect the dots for cross-dc reachability, #23377
* the crossDcFailureDetector was not connected to the reachability table
* additional test by listen for {Reachable/Unreachable}DataCenter events in split spec
* missing Java API for getUnreachableDataCenters in CurrentClusterState
2017-08-22 15:05:40 +02:00
Patrik Nordwall
659b28e4eb Missing become after CurrentClusterState in CrossDcHeartbeatSender, #23371
* and a few other small things
* one can see in the failed test log that there is no ACTIVE log line on the failing node
2017-08-22 14:10:45 +02:00
Johan Andrén
cff43a16f7 Data center reachability in cluster state (#23359)
* Manual case-declassing of CurrentClusterState #23347

* Unreachable data centers set in CurrentClusterState #23347
2017-08-22 13:04:39 +02:00
Patrik Nordwall
6753c1e624 Don't use WeaklyUp immediately, #23554
* see description in issue
2017-08-22 12:02:04 +02:00
Patrik Nordwall
699c78f959 fix NodeChurnSpec tombstones, #23369
* the gossip was growing because we introduced tombstones
* in this test it should be safe to have a short removal period
  of the tombstones
2017-08-15 16:05:36 +02:00
Sébastien Lorion
a95a94acff Replace ClusterRouterGroup/Pool "use-role" with "use-role-set" #23496 2017-08-09 16:06:18 +02:00
Jimin Hsieh
f623d10522 Rename addr to address in non-public API #21874 2017-08-08 13:18:56 +02:00
Martynas Mickevičius
bc0f2ee26d Load MiMa filters from file (#23083) 2017-07-27 12:33:14 +02:00
Johan Andrén
b86b10c477 Elminate race in MultiDcHeartbeatTakingOverSpec #23371 (#23373) 2017-07-19 11:48:27 +09:00