* Replace (deprecate) akka.cluster.auto-down config setting with
akka.cluster.auto-down-unreachable-after
* AutoDown actor that keeps track of unreachable members
and performs down from the leader node when they have been
unreachable for the specified duration
* Migration guide
Moves `def ack: Event` and `def wantsAck: Boolean` from the `Tcp.WriteCommand` type down to the newly introduced `Tcp.CompactWriteCommand`, which breaks existing code depending on these.
Additionally `Tcp.WriteCommand` now has a few additional methods (`+:`, `++:`, prepend) and a companion object.
- introduce around life cycle hooks for symmetry with aroundReceive
- no custom processor-specific life cycle hooks needed any more
- preStart and preRestart can be overridden with empty implementation
(interceptors ensure that super.preXxx calls are still executed)
- all around life cycle hooks can be final
- standard life cycle hooks are non-final to preserve composability with existing traits (FSM, ...)
- use overridden processor and channel ids
- no need anymore to wait for processor instances to stop
- unrelated: fix wrong artifact names in documentation
The most prominent changes compared to eventsourced are:
- No central processor and channel registry any more
- Auto-recovery of processors on start and restart (can be disabled)
- Recovery of processor networks doesn't require coordination
- Explicit channel activation not needed any more
- Message sequence numbers generated per processor (no gaps)
- Sender references are journaled along with messages
- Processors can determine their recovery status
- No custom API on extension object, only messages
- Journal created by extension from config, not by application
- Applications only interact with processors and channels via messages
- Internal design prepared for having processor-specific journal actors (for later optimization possibilities)
Further additions and changes during review:
- Allow processor implementation classes to use inherited stash
- Channel support to resolve (potentially invalid) sender references
- Logical intead of physical deletion of messages
- Pinned dispatcher for LevelDB journal
- Processor can handle failures during recovery
- Message renamed to Persistent
This prototype has the following limitations:
- Serialization of persistent messages and their payload via JavaSerializer only (will be configurable later)
- The LevelDB journal implementation based on a LevelDB Java port, not the native LevelDB (will be configurable later)
The following features will be added later using separate tickets:
- Snapshot-based recovery
- Reliable channels
- Journal plugin API
- Optimizations
- ...
* It was a regression introduced in dc9fe4f
* Two problems:
1) Gossip merge could pop back removed member (was previously
covered by the filter of unreachable)
2) Reachability merge didn't handle all cases for removed member,
i.e. when node not in allowed set
* No reason to use the System.currentTimeMillis base
* Remove the global VectorClock counter completely,
current count for a node is in the VectorClock itself
Before this change, a client connection was always instantly reported as
`Connected`, even if the endpoint would never respond at all.
The reason is a weird behavior of OP_CONNECT and SocketChannel in the
JDK (observed in Linux):
- a channel is always connectable before the connection is attempted
(`channel.connect`). Selecting for OP_CONNECT before the `connect`
call will instantly report connectable
- even worse: after OP_CONNECT was reported true, also `finishConnect`
will always return true, even if the connection wasn't yet established.
That's probably the case because `finishConnect` is internally implemented
depending on previous epoll results (on Linux).
* Replace unreachable Set with Reachability table
* Unreachable members stay in member Set
* Downing a live member was moved it to the unreachable Set,
and then removed from there by the leader. That will not
work when flipping back to reachable, so a Down member must
be detected as unreachable before beeing removed. Similar
to Exiting. Member shuts down itself if it sees itself as
Down.
* Flip back to reachable when failure detector monitors it as
available again
* ReachableMember event
* Can't ignore gossip from aggregated unreachable (see SurviveNetworkInstabilitySpec)
* Make use of ReachableMember event in cluster router
* End heartbeat when acknowledged, EndHeartbeatAck
* Remove nr-of-end-heartbeats from conf
* Full reachability info in JMX cluster status
* Don't use interval after unreachable for AccrualFailureDetector history
* Add QuarantinedEvent to remoting, used for Reachability.Terminated
* Prune reachability table when all reachable
* Update documentation
* Performance testing and optimizations
* Based on discussion on akka-user, Sept 6-9 2013. It seems appropriate
to mention that Agents are not remoteable, so that naive users (like me)
know to design for that.
* This change has not been locally built with Sphinx, but seems to be
small and safe.