Commit graph

1365 commits

Author SHA1 Message Date
Christopher Batey
23373565db
Fix typed cluster singleton cross dc proxies (#24936)
* Fix typed cluster singleton cross dc proxies
* Adds first multi-jvm test for typed cluster
2018-04-27 12:44:44 +01:00
Christopher Batey
a3e52078df Enable header plugin for the MultiJVM configuration (#24974)
Seems when did the changes for 2018 it intro introduced a space in all
after, hence so many changes.
2018-04-25 00:03:55 +09:00
Christopher Batey
4d20b2a660 Reduce size of jenkins logs
Each build is now over 40mb logs.

A lot of DEBUG logging was left on for test failures that have been
fixed. Added an issue # for ones that are still valid or if if it on
as the test verifies debug
2018-04-24 08:49:41 +01:00
Kirill Yankov
3ebb9fa9c1 Fix serialization in TypedActor (#24851)
* fixed serialization in TypedActor
* generalized duplicates via Serialization.manifestFor
2018-04-12 18:58:13 +02:00
Patrik Nordwall
43dc381d59
Clear system messages sequence number for restarted node, #24847
* Notice that the incarnation has changed in SystemMessageDelivery
  and then reset the sequence number
* Take the incarnation number into account in the ClearSystemMessageDelivery
  message
* Trigger quarantine earlier in ClusterRemoteWatcher if node with
  same host:port joined
* Change quarantine-removed-node-after to 5s, shouldn't be necessary
  to delay it 30s
* test reproducer
2018-04-10 11:39:55 +02:00
Konrad `ktoso` Malawski
89b18b05cd
=clu #24840 deprecation mark also in reference conf, removal-margin (#24841) 2018-04-04 10:20:40 +09:00
Patrik Nordwall
4b54941947 log warning if heartbeat sender ticks are delayed (#24785) 2018-03-27 19:22:21 +09:00
Jimin Hsieh
2c2b8ba001 Remove some of Unused import warning (#24650) 2018-03-16 12:08:29 +01:00
Konrad `ktoso` Malawski
563c7fbcf0 Issue 24594: Integration with sbt-headers and initial header population 2018-03-13 15:45:55 +01:00
Patrik Nordwall
0ea8c0d872
Merge pull request #24592 from akka/wip-24576-LargeMessageClusterSpec-patriknw
slowdown LargeMessageClusterSpec for tcp transport, #24576
2018-03-05 16:20:46 +01:00
Patrik Nordwall
1c8a2945ab slowdown LargeMessageClusterSpec for tcp transport, #24576 2018-03-05 15:19:12 +01:00
Johan Andrén
b7cc50cdd6
2.5.10 wire protocol regression (#24625) 2018-02-28 09:46:37 +01:00
Patrik Nordwall
5e80bd97f2 Stop unused Artery outbound streams, #23967
* fix memory leak in SystemMessageDelivery
* initial set of tests for idle outbound associations, credit to mboogerd
* close inbound compression when quarantined, #23967
  * make sure compressions for quarantined are removed in case they are lingering around
  * also means that advertise will not be done for quarantined
  * remove tombstone in InboundCompressions
* simplify async callbacks by using invokeWithFeedback
* compression for old incarnation, #24400
  * it was fixed by the other previous changes
  * also confirmed by running the SimpleClusterApp with TCP
    as described in the ticket
* test with tcp and tls-tcp transport
  * handle the stop signals differently for tcp transport because they
    are converted to StreamTcpException
* cancel timers on shutdown
* share the top-level FR for all Association instances
* use linked queue for control and large streams, less memory usage
* remove quarantined idle Association completely after a configured delay
  * note that shallow Association instances may still lingering in the
    heap because of cached references from RemoteActorRef, which may
    be cached by LruBoundedCache (used by resolve actor ref).
    Those are small, since the queues have been removed, and the cache
    is bounded.
2018-02-21 11:59:18 +01:00
Patrik Nordwall
0d222906f4 Prepare Artery for alternative TCP transport, #24390
* Refactoring to separate the Aeron specific things, ArteryAeronUdpTransport
* move Aeron specific classes to akka.remote.artery.aeron package
* move Version to ArterySettings, and describe strategy for envelope header changes
2018-02-20 16:02:57 +01:00
Renato Cavalcanti
c83e4adfea Rolling update config checker, #24009
* adds config compatibility check
* doc'ed what happens when joining a cluster not supporting this feature
* added extra docs over sensitive paths
2018-02-20 15:47:09 +01:00
Patrik Nordwall
570060815b
Merge pull request #24443 from akka/issue-24144
MultiDcSplitBain: only subscribe to unreachable after split
2018-02-02 15:38:53 +01:00
Patrik Nordwall
23fa8b0810 change spelling of behaviour to behavior, #24457 2018-02-01 15:10:46 +01:00
Christopher Batey
5658d6e77a MultiDcSplitBain: only subscribe to unreachable after split
Test would fail picking up the reachable from the previous unsplit
as it is a new probe.

Also change barrierCounter to split/unsplit so easier to see
where the failure is on a barrier fail
2018-01-30 09:01:15 +00:00
Patrik Nordwall
5cab621e82
Merge pull request #24078 from akka/wip-24055-heartbeat-patriknw
attempt to reproduce heartbeat issue, #24055
2018-01-16 19:10:10 +01:00
Patrik Nordwall
2733a26540 Remove Exiting/Down node from other DC, #24171
* When leaving/downing the last node in a DC it would not
  be removed in another DC, since that was only done by the
  leader in the owning DC (and that is gone).
* It should be ok to eagerly remove such nodes also by
  leaders in other DCs.
* Note that gossip is already sent out so for the last node
  that will be spread to other DC, unless there is a network
  partition. For that we can't do anything. It will be replaced
  if joining again.
2018-01-16 07:55:49 +01:00
Patrik Nordwall
971049c3bb Test that large messages don't disturb cluster heartbeats, #24055 2018-01-15 10:31:15 +01:00
Christopher Batey
0380cc517a Cluster singleton manager: don't send member events to FSM during shutdown (#24236)
There exists a race where a cluter node that is being downed seens its
self as the oldest node (as it has had the other nodes removed) and it
takes over the singleton manager sending the real oldest node to go into
the End state meaning that cluster singletons never work again.

This fix simply prevents Member events being given to the Cluster
Manager FSM during a shut down, instread relying on SelfExiting.

This also hardens the test by not downing the node that the current
sharding coordinator is running on as well as fixing a bug in the
probes.
2018-01-05 09:47:43 +01:00
Christopher Batey
009214ae07
Update copyright to 2018 (#24241) 2018-01-04 17:26:29 +00:00
Pritam Kadam
37f0da17b7 Allow member to leave a cluster via CoordinatedShutdown.run when MemberStatus is Joining/WeaklyUp/Up. (#24152) 2018-01-04 07:43:25 +00:00
Christopher Batey
3bd05ce67e MultiDcSplitBrainSpec: Turn on gossip loggig; Increase gossip frequency (#24024)
The last time this failed there was no gossip to or from a node that
didn't see fifth coming back.

Also note that this test doesn't quite test what it says as the split
brain is repaired before starting the second actor system but without
extensions to the multi jvm test kit this can't be improved.

Refs #23306
2017-12-14 22:26:27 +01:00
Johan Andrén
be3766d0ae
Post 2.5.8 fixes (#24128)
* Update MiMa latest release
* Silence some noise from sbt breaking the relase script
* MiMa excludes we had missed for a couple of releases
2017-12-08 16:53:47 +01:00
Patrik Nordwall
52f30a8043 ClusterSpec, race between MemberRemoved and MemberExited, #23449 (#24105) 2017-12-05 23:12:19 +09:00
Patrik Nordwall
e49acb7daa add Reason to CoordinatedShutdown, #24048 2017-12-04 14:16:06 +01:00
Patrik Nordwall
fa3da328be Run all CoordinatedShutdown phases also when downing, #24048 2017-12-04 11:05:22 +01:00
Patrik Nordwall
1cdd205c02
Merge pull request #23882 from chbatey/issue-23775-multidc-split
Increase time for MultiDcSplitBrain and increase cross DC gossip prob
2017-11-13 15:26:10 +01:00
Christopher Batey
4d3a7e93a6 Increase timeout and remove sleep
The test has been failing infrequently as when we get to the final
barrier (restarted-fifth-removed) the whole test withIn of 40s
has been reached so the last barrier times out right away.

Trying to remove the Thread.sleep and rely on a larger timeout for the
whole test as well as the default barrier timeout of 30s.
2017-11-13 12:20:56 +00:00
Patrik Nordwall
436668687a Move coordinated-shutdown config from test/resources, #23879
* looks like the ActorSystem is shutdown when leaving
* Included in MultiNodeSpec, i.e. all multi-node tests:
  akka.coordinated-shutdown.terminate-actor-system = off
  akka.oordinated-shutdown.run-by-jvm-shutdown-hook = off
2017-11-07 15:38:35 +01:00
Patrik Nordwall
95e0ac43e9 small perf improvement of isGossipSpeedupNeeded for single-dc 2017-11-02 18:27:50 +01:00
Christopher Batey
5a37cdc862 Cross DC gossip fixes #23803
* Adjust cross DC gossip probability for small nr of nodes in a DC
When a Dc is being bootstrapped the initial node has no local peers and
can not gossip if it selects a local gossip round. Start at a
probability of 1.0 for a single node cluster and move down 0.25 per node
until a 5 node DC is reached then use the cross-data-center-gossip-probability
* Fix cross DC gossip selecting of oldest members
This used to select the members based on the sort order members in
Gossip (by address) rather than by upNumber
2017-11-02 09:17:24 +01:00
Christopher Batey
511180ef39 Stop actor system from shutting down on Cluster.leave (#23872)
This then sets a race bewtween the rest of the test running as once the
ActorSystem shuts down test test coordinator won't for for barriers etc.
2017-10-31 19:02:28 +01:00
Patrik Nordwall
86712d5b40 fix confusing logging when receiving gossip from unknown 2017-10-31 14:05:51 +01:00
Martynas Mickevičius
82ca8a2cc7 Port build to SBT 1.x (#23850)
* Port build to SBT 1.x

* Fix multinode tests, always enable genjavadoc bootstrap
2017-10-30 10:13:13 +09:00
Arnout Engelen
9cb5849188 Accept 'Join' messages from nodes without dc (#23822)
* Accept 'Join' messages from nodes without dc

To allow a join from a 2.4 node to a 2.5.6 cluster.

* Use "ClusterSettings.DefaultDataCenter" constant
2017-10-23 04:49:51 -05:00
Arnout Engelen
b1df13d4d4 Update scalariform (#23778) (#23783) 2017-10-06 10:30:28 +02:00
Patrik Nordwall
5fc6d5a04a Verify removal and add of new node incarnation in multi-dc, #23585
* MemberRemoved must be published before MemberUp, e.g. when restarted
  in other DC
* remove from failureDetector when receiving gossip with new member,
  not only new joining member

* increase timeout in MultiDcSingletonManagerSpec
2017-09-25 16:47:06 +02:00
Patrik Nordwall
12196d674e enforce same DC for isOlderThan, #23307 (#23625) 2017-09-25 11:50:28 +02:00
Johan Andrén
c31f6b862f cluster apis for typed, #21226
* Cluster management (join, leave, etc)
* Cluster membership subscriptions (MemberUp, MemberRemoved, etc)
* New SelfUp and SelfRemoved events
* change signature of awaitAssert to return the value (not binary compatible)
* Cluster singleton api
2017-09-21 17:58:29 +02:00
Patrik Nordwall
4f8856f108 Merge pull request #23551 from akka/wip-23502-join-timeout-patriknw
Add timeout to abort joining of seed nodes, #23502
2017-09-11 16:41:35 +02:00
Patrik Nordwall
5cf698a2f6 Add timeout to abort joining of seed nodes, #23502 2017-09-11 15:56:25 +02:00
Patrik Nordwall
cb08535e7d use right youngest when moving to Up, #23582
* also confirm TakeOverFromMe when singleton already in oldest state
2017-09-04 16:02:23 +02:00
Patrik Nordwall
1e4e7cbba2 Merge pull request #23583 from akka/wip-multi-dc-merge-master-patriknw
merge wip-multi-dc-dev back to master
2017-09-01 17:08:28 +02:00
Patrik Nordwall
0ed5bc1835 add mima filters 2017-08-31 11:29:49 +02:00
Patrik Nordwall
6ed3295acd Merge branch 'master' into wip-multi-dc-merge-master-patriknw 2017-08-31 10:51:12 +02:00
Patrik Nordwall
6bfb7c9262 increase timeout in MultiDcSplitBrainSpec
* due to handshake timeout

reduce handshake timeout

fourth might generate UnreachableDataCenter in unsplit

MultiDcClusterSharding
2017-08-31 10:26:23 +02:00
Patrik Nordwall
dc75c4f818 Merge pull request #23531 from akka/wip-23369-NodeChurnSpec-patriknw
fix NodeChurnSpec tombstones, #23369
2017-08-28 09:17:32 +02:00