Commit graph

1549 commits

Author SHA1 Message Date
Arnout Engelen
31f654768f
Update copyright to 2021 2021-01-08 17:55:38 +01:00
Arnout Engelen
4785ed1b48
Better logging when no seed nodes found (#29866)
In a recent support case the 'manual cluster join required'
log message caused some confusion.

Turns out the configuration we used to detect whether Cluster
Bootstrap is available has been changed since
https://github.com/akka/akka-management/pull/476

Unfortunately I don't think we can detect whether Cluster
Bootstrap is actually enabled, since users may call
`ClusterBootstrap(system).start()` whenever they like.
Updated the logging to reflect that better.
2020-12-09 17:29:06 +01:00
Christopher Batey
3602ffa4d9
Ignore gossip desrialization failures (#29848)
* Ignore gossip desrialization failures

Only to happen suring a rolling upgrade. Gives us the option to do
incompatible things in Gossip and have the old nodes ignore the
deserialization error.

* Review feedback
2020-12-02 18:50:16 +00:00
Enno Runne
ffb21da246
Stream Testkit: new-API-friendly (#29831) 2020-12-01 12:06:09 +01:00
Christopher Batey
abfd699985
Harden lease majority spec (#29793)
The failure was due to timing out waiting for the downed side to
terminate. These changes half the time it takes for this to happen.
2020-11-09 10:48:37 +01:00
yiksanchan
61743a80eb
Add type annotation for public method (#29798) 2020-11-09 09:27:05 +01:00
Patrik Nordwall
73a9bbb264
Merge pull request #29691 from akka/wip-29683-log-size-patriknw
log-frame-size-exceeding for Artery, #29683
2020-10-09 10:26:23 +02:00
Renato Cavalcanti
d866fa3f1a
Fix Welcome/Gossip deserialization of app-version (#29692) 2020-10-05 16:09:32 +02:00
Patrik Nordwall
3a7c02014b log-frame-size-exceeding for Artery, #29683 2020-10-05 14:07:26 +02:00
Patrik Nordwall
1bf012837c
disseminate downing decisions faster, #29612 (#29640)
* when SBR downs the reachable side (minority) it's important
  to quickly inform everybody to shutdown
* send gossip directly to downed node, STONITH signal
* gossip to a few random immediatly when self is downed, which
  is always the last from the SBR downing
* enable gossip speedup when there are downed members
* adjust StressSpect to normal again
* adjust TransitionSpect to the new behavior
2020-10-05 11:10:06 +02:00
Patrik Nordwall
90b79144e5
Documentation for Sharding rolling update (#29666) 2020-09-30 12:31:03 +02:00
Patrik Nordwall
2caa560aab
Config for when to move to WeaklyUp (#29665)
* Config for when to move to WeaklyUp

* noticed when I was testing with the StressSpec that it's often moving nodes to WeaklyUp
  in normal joining scenarios (also seen in Kubernetes testing)
* better to wait some longer since the WeaklyUp will require a new convergence round
  and making the full joining -> up take longer time
* changed existing config property to be a duration
* default 7s, previously it was 3s

* on => 7s
2020-09-30 09:54:31 +02:00
Johan Andrén
93a69c42ff
Watching an actor ref on a member triggers termination message #29628 2020-09-28 16:57:31 +02:00
Christopher Batey
50924e56ac
Merge pull request #29502 from chbatey/reintroduce-flush-on-terminate
Reintroduce flush on terminate
2020-09-25 16:14:56 +01:00
Patrik Nordwall
8e2073a6a1 Flush messages before DeathWatchNotification, #28695 (#28940)
* Since DeathWatchNotification is sent over the control channel it may overtake
  other messages that have been sent from the same actor before it stopped.
* It can be confusing that Terminated can't be used as an end-of-conversation marker.
* In classic Remoting we didn't have this problem because all messages were sent over
  the same connection.

* don't send DeathWatchNotification when system is terminating
* when using Cluster we can rely on that the other side will publish AddressTerminated
  when the member has been removed
* it's actually already a race condition that often will result in that the DeathWatchNotification
  from the terminating side
  * in DeathWatch.scala it will remove the watchedBy when receiving AddressTerminated, and that
    may (sometimes) happen before tellWatchersWeDied

* same for Unwatch
* to avoid sending many Unwatch messages when watcher's ActorSystem is terminated
* same race exists for Unwatch as for DeathWatchNotification, if RemoteWatcher publishAddressTerminated
  before the watcher is terminated

* config for the flush timeout, and possibility to disable
2020-09-25 14:37:47 +01:00
Patrik Nordwall
14275b4997
adjust default minimum for down-all-when-unstable (#29661)
* adjust default minimum for down-all-when-unstable

* when down-all-when-unstable=on it will be >= 4 seconds
* in case stable-after is tweaked to low value such as 5 seconds
2020-09-24 15:58:14 +02:00
yiksanchan
74282c42d2
Remove weird char NBSP (#29620)
* Remove weird char NBSP

* cr comment
2020-09-24 12:44:51 +02:00
Patrik Nordwall
1b026ec3d9
Merge pull request #29619 from YikSanChan/fix/replace-to-with-toSet
Replace to with toSet
2020-09-23 09:56:19 +02:00
Patrik Nordwall
d556be77b4
Fix cross-dc heartbeat interval config, #29614 (#29646) 2020-09-22 15:50:42 +02:00
Patrik Nordwall
b28d77b316
simplify the SBR instability check (#29625) 2020-09-21 16:34:21 +02:00
Patrik Nordwall
f5b16bfe2e
Merge pull request #29617 from YikSanChan/fix/comment-index
Fix comment index
2020-09-21 11:42:46 +02:00
Patrik Nordwall
b15baca3e6
Merge pull request #29622 from YikSanChan/cleanup/remove-redundant-brackets
Remove redundant brackets
2020-09-21 11:41:55 +02:00
Patrik Nordwall
45b955850c
Merge pull request #29618 from YikSanChan/fix/tiny
Tiny style fix
2020-09-21 11:37:23 +02:00
Yik San Chan
861303c768 Remove redundant brackets 2020-09-17 20:45:03 -07:00
Yik San Chan
2c164820a4 Remove unreferenced private method 2020-09-17 20:13:07 -07:00
Yik San Chan
11694f065d Replace to with toSet 2020-09-17 18:39:42 -07:00
Yik San Chan
82361c0694 Tiny style fix 2020-09-17 18:33:41 -07:00
Yik San Chan
ddf1a02d4f Fix comment index 2020-09-17 18:24:16 -07:00
Evan Chan
1d759d2a3f remove redundant final in object private def 2020-09-15 17:43:09 -07:00
Patrik Nordwall
36d924b151
Increase stable-after in StressSpec, #29512 (#29603) 2020-09-15 11:16:01 +02:00
Patrik Nordwall
6384d0a242 move mima filter to right version 2020-09-10 17:35:44 +02:00
Patrik Nordwall
42223eb71a
Add app-version to the Member information, #27300 (#29546)
* will be used in rolling update features
* configured with akka.cluster.app-version
* reusing same implementation as ManifestInfo.Version
  by moving that to akka.util.Version
* additional version test
* support dynver format, + separator, and commit number
* improve version parser
* lazy parse
* make Member.appVersion internal
2020-09-10 10:42:03 +02:00
Patrik Nordwall
7bf12721c1 Merge branch 'master' into feature-active-active-event-sourcing 2020-09-02 15:46:06 +02:00
Patrik Nordwall
9b709df2d0
Fix acceptable-heartbeat-pause in cluster.StressSpec, #29512 (#29541) 2020-09-02 12:48:52 +02:00
Patrik Nordwall
ff9b8f44ea increase timeout in MultiDcJoin2Spec, #29505 2020-08-19 09:30:13 +02:00
Patrik Nordwall
da404071dc
full convergence also for joining nodes for first multi-dc join, #29486 (#29499) 2020-08-18 11:49:19 +02:00
Patrik Nordwall
f8c7a118be
Reduce scope of cluster.StressSpec, #23511 (#29472)
* to only exercise membership
* remote deployed routers and supervision of remote deployed actors
  are not priority, and that is what is sometimes failing
2020-08-18 11:08:07 +02:00
Christopher Batey
849018b81e Replicated Sharding improvements (#29483)
* WIP

* Finishing touches to sharding updates

* Review feedback
2020-08-17 07:54:34 +01:00
Arnout Engelen
c41c0420ad
Update scala to 2.13.3 and silencer to 1.7.0 (#28991)
* Update scala to 2.13.3 and silencer to 1.7.0
* Also travis
* Fix various warnings
2020-08-10 12:54:38 +02:00
Patrik Nordwall
686729c75b
Harden multi-dc joining, #29280 (#29346)
* Harden multi-dc joining, #29280

* failing test MultiDcJoinSpec
* require that all have seen the gossip seen for the first member in other DC
* the test also revealed that gossip wasn't propagated between DCs when
  the VectorClock was the same and only seen is different
* add a SHA-1 disgest of the seen in the GossipStatus to detect that they
  are different and that full gossip should be exchanged

* comments

* another test

* mima version
2020-08-07 17:02:31 +01:00
Johan Andrén
a70a19e8ee
Revert "Flush messages before DeathWatchNotification, #28695 (#28940)" (#29357)
This reverts commit f6ceb4d49a.
2020-07-08 13:11:46 +02:00
Patrik Nordwall
f6ceb4d49a
Flush messages before DeathWatchNotification, #28695 (#28940)
* Since DeathWatchNotification is sent over the control channel it may overtake
  other messages that have been sent from the same actor before it stopped.
* It can be confusing that Terminated can't be used as an end-of-conversation marker.
* In classic Remoting we didn't have this problem because all messages were sent over
  the same connection.

* don't send DeathWatchNotification when system is terminating
* when using Cluster we can rely on that the other side will publish AddressTerminated
  when the member has been removed
* it's actually already a race condition that often will result in that the DeathWatchNotification
  from the terminating side
  * in DeathWatch.scala it will remove the watchedBy when receiving AddressTerminated, and that
    may (sometimes) happen before tellWatchersWeDied

* same for Unwatch
* to avoid sending many Unwatch messages when watcher's ActorSystem is terminated
* same race exists for Unwatch as for DeathWatchNotification, if RemoteWatcher publishAddressTerminated
  before the watcher is terminated

* config for the flush timeout, and possibility to disable
2020-07-03 09:54:35 +02:00
Johan Andrén
1e9e984727
Removing, deprecating and replacing usage of black/whitelist (#29254) 2020-06-18 15:48:28 +02:00
Patrik Nordwall
b80cd745eb
Allow update from Lightbend SBR in ClusterReceptionistConfigCompatChecker (#29209)
* and don't allow different strategies
2020-06-10 12:47:16 +02:00
Patrik Nordwall
cd9e9e960a Telemetry SPI hooks for SBR decision, #29085 2020-05-28 08:52:22 +02:00
Patrik Nordwall
1472bd9e8c Log markers for SBR, #29085 2020-05-27 14:26:55 +02:00
Patrik Nordwall
7c77617d18
Merge pull request #29092 from akka/wip-29034-sharding-watch-patriknw
Allow ShardCoordinator to watch old region ActorRef that is not in cluster, #29034
2020-05-27 10:06:38 +02:00
Patrik Nordwall
c45e6ef39b
Add Lightbend's SBR to Akka Cluster, #29085 (#29099)
* change package name to akka.cluster.sbr
* reference.conf has same config paths
* akka.cluster.sbr.SplitBrainResolverProvider instead of com.lightbend.akka.sbr.SplitBrainResolverProvider
* dependency from akka-cluster to akka-coordination, for lease strategy
* move TestLease to akka-coordination and use that in SBR tests
* remove keep-referee strategy
* use keep-majority by default
* review and adjust reference documentation

Co-authored-by: Johan Andrén <johan@markatta.com>
Co-authored-by: Johannes Rudolph <johannes.rudolph@gmail.com>
Co-authored-by: Christopher Batey <christopher.batey@gmail.com>
Co-authored-by: Arnout Engelen <github@bzzt.net>
2020-05-25 12:21:13 +02:00
Patrik Nordwall
228c19e688 Allow ShardCoordinator to watch old region ActorRef that is not in cluster, #29034
* Otherwise the remote watch is disabled and the old region ActorRef remains
  in the coordinator's state
2020-05-19 13:53:42 +02:00
kerr
bada816714
=build Fix commandAlias for fixall and sortImports (#28984)
* =build Fix commandAlias for fixall and sortImports

* =build Update sortImports to 0.5.0

* Sort imports to handle `javax`.

* fx
2020-05-11 11:47:33 +02:00