Commit graph

187 commits

Author SHA1 Message Date
Arnout Engelen
a01eea8f25 Add some logging to track down #20180 2017-07-24 23:53:33 +09:00
Patrik Nordwall
5ad217b338 debug logging in RemoteRestartedQuarantinedSpec, #17314 (#23237) 2017-06-26 23:51:49 -07:00
Patrik Nordwall
8fcb8efe81 marking as pending, #23198 2017-06-19 13:48:52 +02:00
Patrik Nordwall
4b38b056cc adjust number of messages in FanInThrougputSpec and FanOutThrougputSpec
* some of the tests took too long in the nightly runs
2017-06-19 13:20:24 +02:00
Patrik Nordwall
5394e6c2f4 Merge pull request #23090 from akka/wip-more-remote-bench-patriknw
Add fan-in and fan-out benchmarks for remoting
2017-06-16 09:55:09 +02:00
Patrik Nordwall
14617d3ed5 Merge pull request #23129 from akka/wip-23010-ResendUnfulfillableException-patriknw
Fix ResendUnfulfillableException after transport failure detection, #23010
2017-06-11 09:15:20 +02:00
Patrik Nordwall
32f0936d17 Fix ResendUnfulfillableException after transport failure detection, #23010
Reproducer (TransportFailSpec):

* watch from first to second node, i.e. sys msg with seq number 1
* trigger transport failure detection to tear down the connection
* the bug was that on the second node the ReliableDeliverySupervisor
  was stopped because the send buffer had not been used on that side,
  but that removed the receive buffer entry
* later, after gating elapsed another watch from first to second node,
  i.e. sys msg with seq number 2
* when that watch msg was received on the second node the receive buffer
  had been cleared and therefore it thought that seq number 1 was missing,
  and therefore sent nack to the first node
* when first node received the nack it thrown
  IllegalStateException: Error encountered while processing system message
    acknowledgement buffer: [2 {2}] ack: ACK[2, {1, 0}]
  caused by: ResendUnfulfillableException: Unable to fulfill resend request since
    negatively acknowledged payload is no longer in buffer

This was fixed by not stopping the ReliableDeliverySupervisor so that the
receive buffer was preserved.

Not necessary for fixing the issue, but the following config settings were adjusted:
* increased transport-failure-detector timeout to avoid tearing down the
  connection too early
* reduce the quarantine-after-silence to cleanup ReliableDeliverySupervisor
  actors earlier
2017-06-09 14:19:14 +02:00
Patrik Nordwall
6652a368e3 Merge pull request #23069 from akka/remoteReDeploymentFailure
Fix RemoteReDeploymentSpec instability (#20180)
2017-06-09 13:20:48 +02:00
Patrik Nordwall
db9ba53f6c Correction of HDR percentiles logging
* previously it would typically report too low value for the higher
  percentiles, such as value at 98.5 percentile for 99 percentile
2017-06-08 07:28:22 +02:00
Patrik Nordwall
6cfd6d6bad Add fan-in and fan-out benchmarks for remoting
* reusing same actors and reporting as MaxThroughputSpec
2017-06-04 11:16:12 +02:00
Arnout Engelen
adb6d4c601 Fix RemoteReDeploymentSpec instability (#20180)
When failing we observed a second "PostStop" message.

This was a "PostStop" for the new, restarted actor, likely due to the newly
restarted ActorSystem being terminated at the end of the test.
2017-05-30 17:40:20 +02:00
Patrik Nordwall
b72ce56f2f fix more timestamp formatting, #22774
* regression introduced by #22716
  (never released)
2017-04-25 07:43:05 +02:00
Guido Medina
64a3a9c028 Refactor SimpleDateFormat to the new DateTimeFormatter of JDK 8 which is thread safe. 2017-04-24 14:22:31 +09:00
Patrik Nordwall
ee326960c2 fix compilation error in multi-jvm tests
* it was not detected by PR validation because of bug
  in sbt 0.13.13, <<= operator with triggeredBy
* updated to sbt 0.13.15
2017-04-20 10:45:36 +02:00
Patrik Nordwall
de27c18469 Replace FileInputStream and FileOutputStream, #22733
(because they use finalize that is not gc friendly)
2017-04-19 12:05:19 -05:00
astonchev
1b81b1991a Return large buffers to the bufferPool #22723 (#22729)
In the large outbound flow EnvelopeBuffers aquired by Encoder must be
returned to the same buffer pool by the AeronSink. Otherwise one of
the following may happen:
* Full GC (System.gc())
* java.lang.OutOfMemoryError: Direct buffer memory
* kernel killing the process (OOM-killer)
see issue #22723
2017-04-19 08:55:16 +02:00
Johan Andrén
7a0e5b31f8 Avoid Array.ofDim where possible #22516 2017-03-13 17:49:45 +01:00
Patrik Nordwall
35f951d0e4 Move DnsSpec to multi-jvm test, #22330
* run in separate jvm to avoid issues with parallel test execution
  when modifying global System.properties

(cherry picked from commit 1586afe79be568e3815d62d2fb0179a8a017d568)
2017-02-23 16:09:14 +01:00
Martynas Mickevičius
958de6a916 Remove samples (#22288)
Add code, that was used for documentation to the appropriate projects
or akka-docs.
2017-02-14 12:10:23 +01:00
Patrik Nordwall
40894a7945 adjust time assertion in TestConductorSpec
* assertion failed: block took 583.856 milliseconds, should at least have been 600 milliseconds
2017-01-25 17:21:26 +01:00
Patrik Nordwall
84ade6fdc3 add CoordinatedShutdown, #21537
* CoordinatedShutdown that can run tasks for configured phases in order (DAG)
* coordinate handover/shutdown of singleton with cluster exiting/shutdown
* phase config obj with depends-on list
* integrate graceful leaving of sharding in coordinated shutdown
* add timeout and recover
* add some missing artery ports to tests
* leave via CoordinatedShutdown.run
* optionally exit-jvm in last phase
* run via jvm shutdown hook
* send ExitingConfirmed to leader before shutdown of Exiting
  to not have to wait for failure detector to mark it as
  unreachable before removing
* the unreachable signal is still kept as a safe guard if
  message is lost or leader dies
* PhaseClusterExiting vs MemberExited in ClusterSingletonManager
* terminate ActorSystem when cluster shutdown (via Down)
* add more predefined and custom phases
* reference documentation
* migration guide
* problem when the leader order was sys2, sys1, sys3,
  then sys3 could not perform it's duties and move Leving sys1 to
  Exiting because it was observing sys1 as unreachable
* exclude Leaving with exitingConfirmed from convergence condidtion
2017-01-16 09:01:57 +01:00
Konrad `ktoso` Malawski
dcd8cea32e #21475 moving compressions ownership to Decoder (#22047)
* WIP early preview of moving compressions ownership to Decoder

* Compression table created in transport, but owned by Decoder
Added test for restart of inbound stream

* =art snapshot not needed in HeavyHitters since owned by Decoder
2017-01-13 10:33:55 +01:00
Philippus Baalman
6c7085252a extended copyright into 2017 2017-01-04 17:37:15 +01:00
Johannes Rudolph
af377790b0
=rem #21365 less aggressive busy spinning in AeronSource
Benchmarks revealed that busy spinning directly in the graph stage can
lead to an excessive increase in latency when multiple inbound lanes are
active (i.e. the inbound flow has an asynchronous boundary driving the
multiple lanes).

The new strategy is therefore:

For inbound-lanes > 1 or idle-cpu-level < 5: no spinning in the graph stage
For inbound-lanes = 1 and idle-cpu-level >= 6: 50 * settings.Advanced.IdleCpuLevel - 240

which means in general much less or no spinning at all.

Fixes #21365.
2017-01-02 16:27:52 +01:00
Johannes Rudolph
e66cb028b0 =rem #21365 enable multiple lanes in MaxThroughputSpec
This needed the other change for each sender to send to all of the target
actors. Otherwise, large batches of messages to the same target actor would
limit the potential of actually doing work in parallel with multiple lanes due
to head-of-line blocking.
2016-12-30 12:52:42 +01:00
Johannes Rudolph
35feef8d01 =rem log results of MaxThroughputSpec and LatencySpec to result file 2016-12-30 12:50:24 +01:00
Johannes Rudolph
2f5f93daa2 =rem #21365 use default directory for shared media driver to /dev/shm
It was reported that shared media driver performance can depend on the
kind of file-system where the files are contained. /dev/shm is an in-memory
filesystem that was reported to work well with the shared aeron media driver.
2016-12-30 12:32:12 +01:00
Johan Andrén
2679be5ae4 Disable serialization warnings in akka test suites #21882 2016-11-23 12:02:36 +01:00
Johan Andrén
8ae0c9a888 Use long uid in artery remoting and cluster #20644 2016-09-26 15:34:59 +02:00
Endre Sándor Varga
9f7389448a Fix AFR file deletion on Windows 2016-09-20 12:38:58 +02:00
Johan Andrén
a939e30b49 Fix artery test file leak #21484
* Include actor system name in artery dir path to ease debugging leaks
* Base class name changed to make actor system autonaming work
* Add shutdown hook directly in transport start
* Wait for completion in shutdown hook (actual leak fix)
2016-09-19 13:22:54 +02:00
Patrik Nordwall
76c23a7880 fix many bugs in InboundCompressions, #21464
* comprehensive integration test that revealed many bugs
* confirmations of manifests were wrong, at two places
* using wrong tables when system is restarted, including
  originUid in the tables with checks when receiving advertisments
* close (stop scheduling) of advertisments when new incarnation,
  quarantine, or restart
* cleanup how deadLetters ref was treated, and made it more robust
* make Decoder tolerant to decompression failures, can happen in
  case of system restart before handshake completed
* give up resending advertisment after a few attempts without confirmation,
  to avoid keeping outbound association open to possible dead system
* don't advertise new table when no inbound messages,
  to avoid keeping outbound association open to possible dead system
* HeaderBuilder could use manifest field from previous message, added
  resetMessageFields
* No compression for ArteryMessage, e.g. handshake messages must go
  through without depending on compression tables being in sync
* improve debug logging, including originUid
2016-09-19 11:37:44 +02:00
Johan Andrén
392ca5ecce Enable flight recorder in tests #21205
* Setting to configure where the flight recorder puts its file
* Run ArteryMultiNodeSpecs with flight recorder enabled
* More cleanup in exit hook, wait for task runner to stop
* Enable flight recorder for the cluster multi node tests
* Enable flight recorder for multi node remoting tests
* Toggle always-dump flight recorder output when akka.remote.artery.always-dump-flight-recorder is set
2016-09-16 15:12:40 +02:00
Patrik Nordwall
d8bb0ef476 Merge pull request #21406 from akka/wip-21371-prio-patriknw
No ack delivery for prio messages, #21371
2016-09-09 15:41:54 +02:00
Patrik Nordwall
7513617070 Merge pull request #21417 from drewhk/wip-20623-cleanup-aeron-files-drewhk
#20623 Make sure external (mapped) resources are properly cleaned on shutdown
2016-09-09 15:23:13 +02:00
Patrik Nordwall
1584c52190 handle longer network partitions, #21399
* system messages in flight should not trigger premature quarantine
  in case of longer network partitions, therefore we keep the control
  stream alive
* add give-up-system-message-after property that is used by both
  SystemMessageDelivery and AeronSink in the control stream
* also unwrap SystemMessageEnvelope in RemoteDeadLetterActorRef
* skip sending control messages after shutdown, can be triggered
  by scheduled compression advertisment
2016-09-09 14:35:50 +02:00
Endre Sándor Varga
0d77034adc 20623 Make sure external (mapped) resources are properly cleaned on shutdown 2016-09-09 14:29:04 +02:00
Patrik Nordwall
ae11fb3b45 Merge pull request #21413 from akka/wip-21339-enable-misc-serial-patriknw
enable misc serializers by default for Artery, #21339
2016-09-09 14:29:02 +02:00
Martynas Mickevičius
1ce7d7d7e9 #20946 Add bind address (#21404) 2016-09-09 12:46:50 +02:00
Patrik Nordwall
97e0628173 enable misc serializers by default for Artery, #21339
* placed them in a new section additional-serialization-bindings,
  which is included by default when Artery is enabled
* can also be enabled with enable-additional-serialization-bindings
  flag to simplify usage with old remoting
* added a JavaSerializable marker trait that is bound to JavaSerializer
  in testkit, this can be used in tests so that we eventually can run
  tests without the java.io.Serializable binding
2016-09-09 09:01:15 +02:00
Patrik Nordwall
3b7a7dfa59 add reason param to quarantine method 2016-09-08 18:00:37 +02:00
Patrik Nordwall
faf941b4c8 support for parallel lanes, #21207
* for parallel serialziation/deserialization
* MergeHub for the outbound lanes
* BroadcastHub + filter for the inbound lanes, until we
  have a PartitionHub
* simplify materialization of test stage
* add RemoteSendConsistencyWithThreeLanesSpec
2016-09-05 12:42:33 +02:00
Martynas Mickevičius
292face28a #20587 Clean artery configuration (#21279)
* Move artery settings from remoting settings to dedicated class.
* #20587 Move hardcoded settings to configuration file.
* Copy reused settings from remote to the artery
2016-09-01 08:07:39 +02:00
Patrik Nordwall
57ca273903 adjust the hit count sampling with the rate 2016-07-07 10:29:09 +02:00
Patrik Nordwall
95a81e41f9 enable compression by default 2016-07-06 23:07:59 +02:00
Patrik Nordwall
c376ac0c53 remove burstiness in latency tests
* throttle generates bursts but for fair latency tests
  we want the messages to be spread uniformly

* not much need for exploratory testing with AeronStreamsApp
  any longer, not worth to maintain it

* make it possible to run MaxThroughputSpec with old remoting

* add metrics for the task runner, with flight recorder

* tune idle-cpu-level
2016-07-06 20:53:05 +02:00
Patrik Nordwall
d2657a5969 adaptive sampling of hit counting
* when rate exceeds 1000 msg/s adaptive sampling of the
  heavy hitters tracking is enabled by sampling every 256th message
* also fixed some bugs related to advertise in progress

* update InboundCompression state atomically

* enable compression in LatencySpec
2016-07-05 19:54:53 +02:00
Konrad Malawski
d1015c1dc6 Compression tables properly *used* for Outgoing Compression (#20874)
* =art now correctly compresses and 2 table mode working
* =art AGRESSIVELY optimising hashing, not convienved about correctness yet
* fix HandshakeShouldDropCompressionTableSpec
2016-07-04 16:48:11 +02:00
Patrik Nordwall
b2089d06a7 new OutboundEnvelope
* instead of the old Send
* optional recipient, remove of dummy
* pool of OutboundEnvelope
2016-07-01 14:06:48 +02:00
Patrik Nordwall
a021eb5ff4 flush messages on shutdown, #20811
* StreamSupervisor as system actor so that it is
  stopped after ordinary actors
* when transport is shutdown send flush message to all
  outbound associations (over control stream) and wait for ack
  or timeout
2016-07-01 12:29:05 +02:00