Commit graph

1341 commits

Author SHA1 Message Date
Christopher Batey
3bd05ce67e MultiDcSplitBrainSpec: Turn on gossip loggig; Increase gossip frequency (#24024)
The last time this failed there was no gossip to or from a node that
didn't see fifth coming back.

Also note that this test doesn't quite test what it says as the split
brain is repaired before starting the second actor system but without
extensions to the multi jvm test kit this can't be improved.

Refs #23306
2017-12-14 22:26:27 +01:00
Johan Andrén
be3766d0ae
Post 2.5.8 fixes (#24128)
* Update MiMa latest release
* Silence some noise from sbt breaking the relase script
* MiMa excludes we had missed for a couple of releases
2017-12-08 16:53:47 +01:00
Patrik Nordwall
52f30a8043 ClusterSpec, race between MemberRemoved and MemberExited, #23449 (#24105) 2017-12-05 23:12:19 +09:00
Patrik Nordwall
e49acb7daa add Reason to CoordinatedShutdown, #24048 2017-12-04 14:16:06 +01:00
Patrik Nordwall
fa3da328be Run all CoordinatedShutdown phases also when downing, #24048 2017-12-04 11:05:22 +01:00
Patrik Nordwall
1cdd205c02
Merge pull request #23882 from chbatey/issue-23775-multidc-split
Increase time for MultiDcSplitBrain and increase cross DC gossip prob
2017-11-13 15:26:10 +01:00
Christopher Batey
4d3a7e93a6 Increase timeout and remove sleep
The test has been failing infrequently as when we get to the final
barrier (restarted-fifth-removed) the whole test withIn of 40s
has been reached so the last barrier times out right away.

Trying to remove the Thread.sleep and rely on a larger timeout for the
whole test as well as the default barrier timeout of 30s.
2017-11-13 12:20:56 +00:00
Patrik Nordwall
436668687a Move coordinated-shutdown config from test/resources, #23879
* looks like the ActorSystem is shutdown when leaving
* Included in MultiNodeSpec, i.e. all multi-node tests:
  akka.coordinated-shutdown.terminate-actor-system = off
  akka.oordinated-shutdown.run-by-jvm-shutdown-hook = off
2017-11-07 15:38:35 +01:00
Patrik Nordwall
95e0ac43e9 small perf improvement of isGossipSpeedupNeeded for single-dc 2017-11-02 18:27:50 +01:00
Christopher Batey
5a37cdc862 Cross DC gossip fixes #23803
* Adjust cross DC gossip probability for small nr of nodes in a DC
When a Dc is being bootstrapped the initial node has no local peers and
can not gossip if it selects a local gossip round. Start at a
probability of 1.0 for a single node cluster and move down 0.25 per node
until a 5 node DC is reached then use the cross-data-center-gossip-probability
* Fix cross DC gossip selecting of oldest members
This used to select the members based on the sort order members in
Gossip (by address) rather than by upNumber
2017-11-02 09:17:24 +01:00
Christopher Batey
511180ef39 Stop actor system from shutting down on Cluster.leave (#23872)
This then sets a race bewtween the rest of the test running as once the
ActorSystem shuts down test test coordinator won't for for barriers etc.
2017-10-31 19:02:28 +01:00
Patrik Nordwall
86712d5b40 fix confusing logging when receiving gossip from unknown 2017-10-31 14:05:51 +01:00
Martynas Mickevičius
82ca8a2cc7 Port build to SBT 1.x (#23850)
* Port build to SBT 1.x

* Fix multinode tests, always enable genjavadoc bootstrap
2017-10-30 10:13:13 +09:00
Arnout Engelen
9cb5849188 Accept 'Join' messages from nodes without dc (#23822)
* Accept 'Join' messages from nodes without dc

To allow a join from a 2.4 node to a 2.5.6 cluster.

* Use "ClusterSettings.DefaultDataCenter" constant
2017-10-23 04:49:51 -05:00
Arnout Engelen
b1df13d4d4 Update scalariform (#23778) (#23783) 2017-10-06 10:30:28 +02:00
Patrik Nordwall
5fc6d5a04a Verify removal and add of new node incarnation in multi-dc, #23585
* MemberRemoved must be published before MemberUp, e.g. when restarted
  in other DC
* remove from failureDetector when receiving gossip with new member,
  not only new joining member

* increase timeout in MultiDcSingletonManagerSpec
2017-09-25 16:47:06 +02:00
Patrik Nordwall
12196d674e enforce same DC for isOlderThan, #23307 (#23625) 2017-09-25 11:50:28 +02:00
Johan Andrén
c31f6b862f cluster apis for typed, #21226
* Cluster management (join, leave, etc)
* Cluster membership subscriptions (MemberUp, MemberRemoved, etc)
* New SelfUp and SelfRemoved events
* change signature of awaitAssert to return the value (not binary compatible)
* Cluster singleton api
2017-09-21 17:58:29 +02:00
Patrik Nordwall
4f8856f108 Merge pull request #23551 from akka/wip-23502-join-timeout-patriknw
Add timeout to abort joining of seed nodes, #23502
2017-09-11 16:41:35 +02:00
Patrik Nordwall
5cf698a2f6 Add timeout to abort joining of seed nodes, #23502 2017-09-11 15:56:25 +02:00
Patrik Nordwall
cb08535e7d use right youngest when moving to Up, #23582
* also confirm TakeOverFromMe when singleton already in oldest state
2017-09-04 16:02:23 +02:00
Patrik Nordwall
1e4e7cbba2 Merge pull request #23583 from akka/wip-multi-dc-merge-master-patriknw
merge wip-multi-dc-dev back to master
2017-09-01 17:08:28 +02:00
Patrik Nordwall
0ed5bc1835 add mima filters 2017-08-31 11:29:49 +02:00
Patrik Nordwall
6ed3295acd Merge branch 'master' into wip-multi-dc-merge-master-patriknw 2017-08-31 10:51:12 +02:00
Patrik Nordwall
6bfb7c9262 increase timeout in MultiDcSplitBrainSpec
* due to handshake timeout

reduce handshake timeout

fourth might generate UnreachableDataCenter in unsplit

MultiDcClusterSharding
2017-08-31 10:26:23 +02:00
Patrik Nordwall
dc75c4f818 Merge pull request #23531 from akka/wip-23369-NodeChurnSpec-patriknw
fix NodeChurnSpec tombstones, #23369
2017-08-28 09:17:32 +02:00
Patrik Nordwall
e3aada5016 Connect the dots for cross-dc reachability, #23377
* the crossDcFailureDetector was not connected to the reachability table
* additional test by listen for {Reachable/Unreachable}DataCenter events in split spec
* missing Java API for getUnreachableDataCenters in CurrentClusterState
2017-08-22 15:05:40 +02:00
Patrik Nordwall
659b28e4eb Missing become after CurrentClusterState in CrossDcHeartbeatSender, #23371
* and a few other small things
* one can see in the failed test log that there is no ACTIVE log line on the failing node
2017-08-22 14:10:45 +02:00
Johan Andrén
cff43a16f7 Data center reachability in cluster state (#23359)
* Manual case-declassing of CurrentClusterState #23347

* Unreachable data centers set in CurrentClusterState #23347
2017-08-22 13:04:39 +02:00
Patrik Nordwall
6753c1e624 Don't use WeaklyUp immediately, #23554
* see description in issue
2017-08-22 12:02:04 +02:00
Patrik Nordwall
699c78f959 fix NodeChurnSpec tombstones, #23369
* the gossip was growing because we introduced tombstones
* in this test it should be safe to have a short removal period
  of the tombstones
2017-08-15 16:05:36 +02:00
Sébastien Lorion
a95a94acff Replace ClusterRouterGroup/Pool "use-role" with "use-role-set" #23496 2017-08-09 16:06:18 +02:00
Jimin Hsieh
f623d10522 Rename addr to address in non-public API #21874 2017-08-08 13:18:56 +02:00
Martynas Mickevičius
bc0f2ee26d Load MiMa filters from file (#23083) 2017-07-27 12:33:14 +02:00
Johan Andrén
b86b10c477 Elminate race in MultiDcHeartbeatTakingOverSpec #23371 (#23373) 2017-07-19 11:48:27 +09:00
Konrad `ktoso` Malawski
c728098b3d
=clu,dc #23354 do not heartbeat to yourself (cross-dc) 2017-07-14 13:00:45 +09:00
Konrad `ktoso` Malawski
eb24033cc0 =clu,dc #23340 additional test to see a node take over monitoring of remote DC (#23342) 2017-07-13 12:50:28 +02:00
Martynas Mickevičius
73d3c5db5d DC reachability events #23245 2017-07-12 13:48:15 +01:00
Johan Andrén
9c7e8d027a Renamed/moved the self data center setting #23312 (#23344) 2017-07-12 11:47:32 +01:00
Johan Andrén
be5a0207bb Prune version clocks based on merged tombstones when merging #23318 2017-07-11 16:29:32 +01:00
Johan Andrén
a15e459922 Merging did not prune vector clocks for tombstoned nodes #23318 2017-07-10 13:01:06 +01:00
Johan Andrén
9f4da87840 =clu #23286 filter emitted reachability event by DC 2017-07-07 16:50:36 +01:00
Johan Andrén
3be504dd00 Unbreak MultiDcSunnyWeatherSpec #23310 2017-07-07 15:11:58 +01:00
Johan Andrén
c0d439eac3 limit cross dc gossip #23282 2017-07-07 13:19:10 +01:00
Konrad `ktoso` Malawski
b568975acc =clu #23229 multi-dc heartbeating, only N nodes perform monitoring 2017-07-07 12:17:41 +01:00
Johan Andrén
ab3efff3bd MultiDcSplitBrainSpec fixed #23288 2017-07-05 13:50:10 +02:00
Patrik Nordwall
867cc97bdd Refactoring of Gossip class, #23290
* move methods that depends on selfUniqueAddress and selfDc
  to a separate MembershipState class, which also holds the
  latest gossip
* this removes the need to pass in the parameters from everywhere and
  makes it easier to cache some results
* makes it clear that those parameters are always selfUniqueAddress
  and selfDc, instead of some arbitary node/dc
2017-07-05 08:47:32 +02:00
Patrik Nordwall
bb9549263e Rename team to data center, #23275 2017-07-04 17:11:21 +02:00
Patrik Nordwall
e0fe0bc49e Make cluster sharding DC aware, #23231
* Sharding only within own team (coordinator is singleton)
* the ddata Replicator used by Sharding must also be only within own team
* added support for Set of roles in ddata Replicator so that can be used
  by sharding to specify role + team
* Sharding proxy can route to sharding in another team
2017-07-04 15:04:43 +02:00
Patrik Nordwall
e37243f471 Merge pull request #23285 from jrudolph/jr/w/introduce-internal-reachability-event
Some additional Reachability comments / documentation
2017-07-04 14:55:32 +02:00