Commit graph

1389 commits

Author SHA1 Message Date
kerr
e9fb3a020a Make use of scalafix to remove unused import. (#26019)
* =build Add scalafix to remote unused import.

* +build Add file ignore plugin for scalafix which support ignore files.
2018-12-05 08:30:21 +00:00
Christopher Batey
c8013d54f3 Include configuration name in log when cluster not actor provider 2018-11-16 15:58:59 +00:00
Helena Edelson
f872115512 Cluster event listener that logs all events #25832 (#25918) 2018-11-15 17:31:58 +01:00
Johan Andrén
f66ee1cbe8
Handle lost typed receptionist removals #24887
Keep track of removed actors and re-remove them when ORMultiMap conflict has reintroduced them
2018-11-09 10:58:18 +01:00
Patrik Nordwall
90bc4cfa3e
Improvements of singleton leaving scenario, #25639 (#25710)
* Testing of singleton leaving
* gossip optimization, exiting change to two oldest per role
* hardening ClusterSingletonManagerIsStuck restart, increase ClusterSingletonManagerIsStuck
2018-11-09 09:42:48 +01:00
Arnout Engelen
079aa46733 Introduce 'MemberDowned' member event (#25854)
* Introduce 'MemberDowned' member event

Compatiblity note: MemberEvent is a sealed trait, so it is debatable whether
it is acceptable to introduce a new member.

* Be more conservative (more like leaving), add test
2018-11-05 10:03:06 +00:00
kerr
fafc59b19d update headers to regular comment (#25807) 2018-10-29 05:19:37 -04:00
Johannes Rudolph
655bef2e71 Fix typo in reference.conf (#25821) 2018-10-24 07:42:03 -04:00
Christopher Batey
ba67f71ca8 Treat MemberStatus.Removed as terminal state in ClusterReadView (#25499)
* Fixes #25489 where cluster event for a previous state can override
the call to cluster.close settings it to remove
* Fix case where Removed is used as a placeholder for unknown
2018-10-03 14:01:38 +02:00
Patrik Nordwall
a6737b5e42 Don't automatically down quarantined node, #25632 2018-09-24 14:05:13 +02:00
Johan Andrén
8ed4f5abab Undeprecate the config, add a note in cluster singleton 2018-09-04 14:21:25 +02:00
Patrik Nordwall
25079cb568
Support joining 2.5.9 or earlier, compat InitJoinAck, #25491
* Detect that joining node is 2.5.9 or earlier by empty ConfigCheck
  config in InitJoin message. Then send back Address, which was the
  old representation of InitJoinAck
* Include akka.version in logging to facilitate troubleshooting
2018-08-23 20:37:42 +02:00
Kazuhiro Sera
482eaea122 Fix several minor typos detected by github.com/client9/misspell (#25448)
* Fix several minor typos detected by github.com/client9/misspell

* Revert s/erminater/erminator/ in /ActorSystemSpec
2018-08-21 11:02:37 +09:00
kenji yoshida
5b3b191bac Remove procedure syntax (#25362) 2018-07-25 13:38:27 +02:00
Seeta Ramayya
61790d763d Improved logging when node joins itself #25279 2018-07-19 10:26:28 +02:00
Konrad `ktoso` Malawski
29f30a4a78 =clu Accept Welcome message from previous joinSeedNodeProcess #25295 (#25297)
* =clu Accept Welcome message from previous joinSeedNodeProcess #25295
2018-07-03 15:22:20 +01:00
Christopher Batey
ee7e699d23 Cluster sharding: Set waiting for state timeout for tests
Default is 5s which means if the first Read is lost and
a test ddata have any secondary nodes to query it'll
timeout waiting to get the state.

E.g. read being ignored due to loading durable state then
never gets retries
2018-07-02 12:50:29 +01:00
Roman Filonenko
502ff08df5 fix akka-cluster-tools compile error 2018-06-16 18:52:45 +02:00
Christopher Batey
28b86379c8 Harden MultiDcClusterShardingSpec (#25201)
- Use global multi node cluster config
- Reduce retry interval for ShardRegion register
- Add clue to unhelpful assert failing
2018-06-15 15:28:04 +02:00
Christopher Batey
01f90ad95d
Add common multi node cluster config to all cluster sharding tests (#25202) 2018-06-05 06:58:17 +01:00
Christopher Batey
1787283757 Log receiving of heartbeats when verbose heartbeat logging is on (#25183) 2018-06-04 16:22:06 +03:00
jorgesg1986
fceca07ec0 Added log messages when leadership is gained or lost (#25053) 2018-05-31 15:43:53 +02:00
Christopher Batey
485179c5f6 ClusterDeathWatchSpec: assert cluster status bewtween tests (#25126)
Test failed due to node4 still seeing node3 in a subsequent test
but assumed it had been removed.

Fixes #25065
2018-05-29 13:38:20 +02:00
Patrik Nordwall
7fc7744049 Quarantine and cleanup idle associations, #24972
* fix NPE in shutdownTransport
  * perhaps because shutdown before started
  * system.dispatcher is used in other places of the shutdown
* improve logging of compression advertisment progress
* adjust RestartFlow.withBackoff parameters
* quarantine after ActorSystemTerminating signal
  (will cleanup compressions)
* Quarantine idle associations
  * liveness checks by sending extra HandshakeReq and update the
    lastUsed when reply received
  * concervative default value to survive network partition, in
    case no other messages are sent
* Adjust logging and QuarantinedEvent for harmless quarantine
  * Harmless if it was via the shutdown signal or cluster leaving
2018-05-22 13:10:30 +02:00
Christopher Batey
23373565db
Fix typed cluster singleton cross dc proxies (#24936)
* Fix typed cluster singleton cross dc proxies
* Adds first multi-jvm test for typed cluster
2018-04-27 12:44:44 +01:00
Christopher Batey
a3e52078df Enable header plugin for the MultiJVM configuration (#24974)
Seems when did the changes for 2018 it intro introduced a space in all
after, hence so many changes.
2018-04-25 00:03:55 +09:00
Christopher Batey
4d20b2a660 Reduce size of jenkins logs
Each build is now over 40mb logs.

A lot of DEBUG logging was left on for test failures that have been
fixed. Added an issue # for ones that are still valid or if if it on
as the test verifies debug
2018-04-24 08:49:41 +01:00
Kirill Yankov
3ebb9fa9c1 Fix serialization in TypedActor (#24851)
* fixed serialization in TypedActor
* generalized duplicates via Serialization.manifestFor
2018-04-12 18:58:13 +02:00
Patrik Nordwall
43dc381d59
Clear system messages sequence number for restarted node, #24847
* Notice that the incarnation has changed in SystemMessageDelivery
  and then reset the sequence number
* Take the incarnation number into account in the ClearSystemMessageDelivery
  message
* Trigger quarantine earlier in ClusterRemoteWatcher if node with
  same host:port joined
* Change quarantine-removed-node-after to 5s, shouldn't be necessary
  to delay it 30s
* test reproducer
2018-04-10 11:39:55 +02:00
Konrad `ktoso` Malawski
89b18b05cd
=clu #24840 deprecation mark also in reference conf, removal-margin (#24841) 2018-04-04 10:20:40 +09:00
Patrik Nordwall
4b54941947 log warning if heartbeat sender ticks are delayed (#24785) 2018-03-27 19:22:21 +09:00
Jimin Hsieh
2c2b8ba001 Remove some of Unused import warning (#24650) 2018-03-16 12:08:29 +01:00
Konrad `ktoso` Malawski
563c7fbcf0 Issue 24594: Integration with sbt-headers and initial header population 2018-03-13 15:45:55 +01:00
Patrik Nordwall
0ea8c0d872
Merge pull request #24592 from akka/wip-24576-LargeMessageClusterSpec-patriknw
slowdown LargeMessageClusterSpec for tcp transport, #24576
2018-03-05 16:20:46 +01:00
Patrik Nordwall
1c8a2945ab slowdown LargeMessageClusterSpec for tcp transport, #24576 2018-03-05 15:19:12 +01:00
Johan Andrén
b7cc50cdd6
2.5.10 wire protocol regression (#24625) 2018-02-28 09:46:37 +01:00
Patrik Nordwall
5e80bd97f2 Stop unused Artery outbound streams, #23967
* fix memory leak in SystemMessageDelivery
* initial set of tests for idle outbound associations, credit to mboogerd
* close inbound compression when quarantined, #23967
  * make sure compressions for quarantined are removed in case they are lingering around
  * also means that advertise will not be done for quarantined
  * remove tombstone in InboundCompressions
* simplify async callbacks by using invokeWithFeedback
* compression for old incarnation, #24400
  * it was fixed by the other previous changes
  * also confirmed by running the SimpleClusterApp with TCP
    as described in the ticket
* test with tcp and tls-tcp transport
  * handle the stop signals differently for tcp transport because they
    are converted to StreamTcpException
* cancel timers on shutdown
* share the top-level FR for all Association instances
* use linked queue for control and large streams, less memory usage
* remove quarantined idle Association completely after a configured delay
  * note that shallow Association instances may still lingering in the
    heap because of cached references from RemoteActorRef, which may
    be cached by LruBoundedCache (used by resolve actor ref).
    Those are small, since the queues have been removed, and the cache
    is bounded.
2018-02-21 11:59:18 +01:00
Patrik Nordwall
0d222906f4 Prepare Artery for alternative TCP transport, #24390
* Refactoring to separate the Aeron specific things, ArteryAeronUdpTransport
* move Aeron specific classes to akka.remote.artery.aeron package
* move Version to ArterySettings, and describe strategy for envelope header changes
2018-02-20 16:02:57 +01:00
Renato Cavalcanti
c83e4adfea Rolling update config checker, #24009
* adds config compatibility check
* doc'ed what happens when joining a cluster not supporting this feature
* added extra docs over sensitive paths
2018-02-20 15:47:09 +01:00
Patrik Nordwall
570060815b
Merge pull request #24443 from akka/issue-24144
MultiDcSplitBain: only subscribe to unreachable after split
2018-02-02 15:38:53 +01:00
Patrik Nordwall
23fa8b0810 change spelling of behaviour to behavior, #24457 2018-02-01 15:10:46 +01:00
Christopher Batey
5658d6e77a MultiDcSplitBain: only subscribe to unreachable after split
Test would fail picking up the reachable from the previous unsplit
as it is a new probe.

Also change barrierCounter to split/unsplit so easier to see
where the failure is on a barrier fail
2018-01-30 09:01:15 +00:00
Patrik Nordwall
5cab621e82
Merge pull request #24078 from akka/wip-24055-heartbeat-patriknw
attempt to reproduce heartbeat issue, #24055
2018-01-16 19:10:10 +01:00
Patrik Nordwall
2733a26540 Remove Exiting/Down node from other DC, #24171
* When leaving/downing the last node in a DC it would not
  be removed in another DC, since that was only done by the
  leader in the owning DC (and that is gone).
* It should be ok to eagerly remove such nodes also by
  leaders in other DCs.
* Note that gossip is already sent out so for the last node
  that will be spread to other DC, unless there is a network
  partition. For that we can't do anything. It will be replaced
  if joining again.
2018-01-16 07:55:49 +01:00
Patrik Nordwall
971049c3bb Test that large messages don't disturb cluster heartbeats, #24055 2018-01-15 10:31:15 +01:00
Christopher Batey
0380cc517a Cluster singleton manager: don't send member events to FSM during shutdown (#24236)
There exists a race where a cluter node that is being downed seens its
self as the oldest node (as it has had the other nodes removed) and it
takes over the singleton manager sending the real oldest node to go into
the End state meaning that cluster singletons never work again.

This fix simply prevents Member events being given to the Cluster
Manager FSM during a shut down, instread relying on SelfExiting.

This also hardens the test by not downing the node that the current
sharding coordinator is running on as well as fixing a bug in the
probes.
2018-01-05 09:47:43 +01:00
Christopher Batey
009214ae07
Update copyright to 2018 (#24241) 2018-01-04 17:26:29 +00:00
Pritam Kadam
37f0da17b7 Allow member to leave a cluster via CoordinatedShutdown.run when MemberStatus is Joining/WeaklyUp/Up. (#24152) 2018-01-04 07:43:25 +00:00
Christopher Batey
3bd05ce67e MultiDcSplitBrainSpec: Turn on gossip loggig; Increase gossip frequency (#24024)
The last time this failed there was no gossip to or from a node that
didn't see fifth coming back.

Also note that this test doesn't quite test what it says as the split
brain is repaired before starting the second actor system but without
extensions to the multi jvm test kit this can't be improved.

Refs #23306
2017-12-14 22:26:27 +01:00
Johan Andrén
be3766d0ae
Post 2.5.8 fixes (#24128)
* Update MiMa latest release
* Silence some noise from sbt breaking the relase script
* MiMa excludes we had missed for a couple of releases
2017-12-08 16:53:47 +01:00