* The problem: ACK that was targeted to an old incarnation
was sent to the new, restarted, system with same host:port, and
therefore resulting issues noticed as
"Error encountered while processing system message acknowledgement buffer: [-1 {}] ack: ACK[0, {}]"
when restarting actor system
* The reason:
1. The endpoint reader was about to send OutgoingAck to parent reader,
targeted to the old system.
2. At the same time there is an incoming connection from new system
that triggered TakeOver in the endpoint writer, i.e. replacing
the handle to the connection of the new system.
3. The OutgoingAck is received by the writer, which happily sends it
to the new handle, the new system.
* The solution: Ignore OutgoingAck during the handoff (TakeOver) process.
Publish appropriate events to the current ActorSystem event stream upon remote ActorSystem shutdown or when current ActorSystem is quarantined by the remote ActorSystem.
* DeathPactException could occur if the ReliableDeliverySupervisor
was gated but not yet received Terminated and got an Ungate message
from the EndpointManager and thereby entered idle state, followed by
receiving the Terminated message, which is not handled in idle
When watching many (5000) actors at the same time the
following problems were found:
* first send of a sys msg is sent without any flow control
=> limit the number of outstanding sys msg by using
the buffer to send them later (ordinary resend)
* when msg cannot be written sys msg is dropped (relying on resend),
but that cause message re-ordering and negative acknowledgment,
which is very costly
=> buffer the sys msg on write failure
=> minor optimization of AckedReceiveBuffer
I also made the resend-limit configurable.
(cherry picked from commit ecfc271e9a9d7efcf76945632d89c78740291cc6)
* Replace stash with internal bufferi, j.u.LinkedList
* Replace FSM with become
* Adaptive backoff, important to backoff, but not for too long,
depends on environment and use case
* Prioritize heartbeat messages from remote watcher and cluster
failure detector
* Use payload messages as heartbeats for transport failure detector,
change transport failure detector to be based on absolute timeout,
see ticket #13989 and #13742
* Log remote disassociate from transport failure detector,
see ticket #13985
* Add benchmark sample in akka-sample-remote-scala
* actor name [endpointWriter] is not unique
* The problem was that the test used 100ms until ungate and
that made it possible for the endpointWriter to not be
completely terminated (and removed) before Ungate and
new Send in idle state, which created new endpointWriter
* I could reproduce it with a sleep in EndpointWriter.postStop
* The solution is to start the scheduled Ungate after Terminated
has been received
* because it is not referentially transparent; normally we reserved parens for
side-effecting code but given how people thoughtlessly close over it we revised
that that decision for sender
* caller can still omit parens
- removed retry-window and related settings
- removed gate-invalid-addresses-for
- gate is now mandatory
- remoting has a dedicated dispatcher by default
- updated tests to work with changed timings
- added doc section for association lifecycle
- Added refuseUid support in Akka protocol and EndpointManager
- The AkkaProtocolTransport interface is now a first-class citizen in remoting and endpoint actors
- The AkkaProtocolTransport interface is now a first-class citizen in endpoint actors
* When actor system was restarted quickly the new system replied to
heartbeats and Terminated was never triggered for actors in old
system.
* Solved by sending an extra Watch system message when first hearbeat
is received for an address and when a change of system uid is detected.
- Also introduces reason in the Disassociate message
- Reliable delivery now transitions from idle to active if there are pending system msgs
- Minor fix in merging receive buffers (reduces resends)
- Tweaked WireFormat
- Removed busy-wait in startup
- throwing the proper exception type in EndpointReader
- InvalidAssociationException extends NoStackTrace
* Supress TimeoutReason logging
* Add logTermination in FSM
* Improve some error messages, incl making them unique
* Cookie only logged if debug enabled