From 7d146e6fc3c4738cb41121074bee7bf6af57dbc3 Mon Sep 17 00:00:00 2001 From: Patrik Nordwall Date: Wed, 1 Jul 2015 14:58:14 +0200 Subject: [PATCH] =clu #16800 Remove NIY sections of Cluster Specification * The Not Implemented Yet sections have passed their best before date --- akka-docs/rst/common/cluster.rst | 319 +------------------------------ 1 file changed, 4 insertions(+), 315 deletions(-) diff --git a/akka-docs/rst/common/cluster.rst b/akka-docs/rst/common/cluster.rst index ee7c5442a7..f317649be6 100644 --- a/akka-docs/rst/common/cluster.rst +++ b/akka-docs/rst/common/cluster.rst @@ -5,15 +5,6 @@ ###################### .. note:: This document describes the design concepts of the clustering. - It is divided into two parts, where the first part describes what is - currently implemented and the second part describes what is planned as - future enhancements/additions. References to unimplemented parts have - been marked with the footnote :ref:`[*] ` - - -The Current Cluster -******************* - Intro ===== @@ -34,9 +25,8 @@ Terms A set of nodes joined together through the `membership`_ service. **leader** - A single node in the cluster that acts as the leader. Managing cluster convergence, - partitions :ref:`[*] `, fail-over :ref:`[*] `, rebalancing :ref:`[*] ` - etc. + A single node in the cluster that acts as the leader. Managing cluster convergence + and membership state transitions. Membership @@ -44,8 +34,8 @@ Membership A cluster is made up of a set of member nodes. The identifier for each node is a ``hostname:port:uid`` tuple. An Akka application can be distributed over a cluster with -each node hosting some part of the application. Cluster membership and partitioning -:ref:`[*] ` of the application are decoupled. A node could be a member of a +each node hosting some part of the application. Cluster membership and the actors running +on that node of the application are decoupled. A node could be a member of a cluster without hosting any actors. Joining a cluster is initiated by issuing a ``Join`` command to one of the nodes in the cluster to join. @@ -349,304 +339,3 @@ Failure Detection and Unreachability again and thereby remove the flag -Future Cluster Enhancements and Additions -***************************************** - - -Goal -==== - -In addition to membership also provide automatic partitioning :ref:`[*] `, -handoff :ref:`[*] `, and cluster rebalancing :ref:`[*] ` of actors. - - -Additional Terms -================ - -These additional terms are used in this section. - -**partition** :ref:`[*] ` - An actor or subtree of actors in the Akka application that is distributed - within the cluster. - -**partition point** :ref:`[*] ` - The actor at the head of a partition. The point around which a partition is - formed. - -**partition path** :ref:`[*] ` - Also referred to as the actor address. Has the format `actor1/actor2/actor3` - -**instance count** :ref:`[*] ` - The number of instances of a partition in the cluster. Also referred to as the - ``N-value`` of the partition. - -**instance node** :ref:`[*] ` - A node that an actor instance is assigned to. - -**partition table** :ref:`[*] ` - A mapping from partition path to a set of instance nodes (where the nodes are - referred to by the ordinal position given the nodes in sorted order). - - -Partitioning :ref:`[*] ` -============================= - -.. note:: Actor partitioning is not implemented yet. - -Each partition (an actor or actor subtree) in the actor system is assigned to a -set of nodes in the cluster. The actor at the head of the partition is referred -to as the partition point. The mapping from partition path (actor address of the -format "a/b/c") to instance nodes is stored in the partition table and is -maintained as part of the cluster state through the gossip protocol. The -partition table is only updated by the ``leader`` node. Currently the only possible -partition points are *routed* actors. - -Routed actors can have an instance count greater than one. The instance count is -also referred to as the ``N-value``. If the ``N-value`` is greater than one then -a set of instance nodes will be given in the partition table. - -Note that in the first implementation there may be a restriction such that only -top-level partitions are possible (the highest possible partition points are -used and sub-partitioning is not allowed). Still to be explored in more detail. - -The cluster ``leader`` determines the current instance count for a partition based -on two axes: fault-tolerance and scaling. - -Fault-tolerance determines a minimum number of instances for a routed actor -(allowing N-1 nodes to crash while still maintaining at least one running actor -instance). The user can specify a function from current number of nodes to the -number of acceptable node failures: n: Int => f: Int where f < n. - -Scaling reflects the number of instances needed to maintain good throughput and -is influenced by metrics from the system, particularly a history of mailbox -size, CPU load, and GC percentages. It may also be possible to accept scaling -hints from the user that indicate expected load. - -The balancing of partitions can be determined in a very simple way in the first -implementation, where the overlap of partitions is minimized. Partitions are -spread over the cluster ring in a circular fashion, with each instance node in -the first available space. For example, given a cluster with ten nodes and three -partitions, A, B, and C, having N-values of 4, 3, and 5; partition A would have -instances on nodes 1-4; partition B would have instances on nodes 5-7; partition -C would have instances on nodes 8-10 and 1-2. The only overlap is on nodes 1 and -2. - -The distribution of partitions is not limited, however, to having instances on -adjacent nodes in the sorted ring order. Each instance can be assigned to any -node and the more advanced load balancing algorithms will make use of this. The -partition table contains a mapping from path to instance nodes. The partitioning -for the above example would be:: - - A -> { 1, 2, 3, 4 } - B -> { 5, 6, 7 } - C -> { 8, 9, 10, 1, 2 } - -If 5 new nodes join the cluster and in sorted order these nodes appear after the -current nodes 2, 4, 5, 7, and 8, then the partition table could be updated to -the following, with all instances on the same physical nodes as before:: - - A -> { 1, 2, 4, 5 } - B -> { 7, 9, 10 } - C -> { 12, 14, 15, 1, 2 } - -When rebalancing is required the ``leader`` will schedule handoffs, gossiping a set -of pending changes, and when each change is complete the ``leader`` will update the -partition table. - - -Additional Leader Responsibilities ----------------------------------- - -After moving a member from joining to up, the leader can start assigning partitions -:ref:`[*] ` to the new node, and when a node is ``leaving`` the ``leader`` will -reassign partitions :ref:`[*] ` across the cluster (it is possible for a leaving -node to itself be the ``leader``). When all partition handoff :ref:`[*] ` has -completed then the node will change to the ``exiting`` state. - -On convergence the leader can schedule rebalancing across the cluster, -but it may also be possible for the user to explicitly rebalance the -cluster by specifying migrations :ref:`[*] `, or to rebalance :ref:`[*] ` -the cluster automatically based on metrics from member nodes. Metrics may be spread -using the gossip protocol or possibly more efficiently using a *random chord* method, -where the ``leader`` contacts several random nodes around the cluster ring and each -contacted node gathers information from their immediate neighbours, giving a random -sampling of load information. - - -Handoff -------- - -Handoff for an actor-based system is different than for a data-based system. The -most important point is that message ordering (from a given node to a given -actor instance) may need to be maintained. If an actor is a singleton actor -(only one instance possible throughout the cluster) then the cluster may also -need to assure that there is only one such actor active at any one time. Both of -these situations can be handled by forwarding and buffering messages during -transitions. - -A *graceful handoff* (one where the previous host node is up and running during -the handoff), given a previous host node ``N1``, a new host node ``N2``, and an -actor partition ``A`` to be migrated from ``N1`` to ``N2``, has this general -structure: - - 1. the ``leader`` sets a pending change for ``N1`` to handoff ``A`` to ``N2`` - - 2. ``N1`` notices the pending change and sends an initialization message to ``N2`` - - 3. in response ``N2`` creates ``A`` and sends back a ready message - - 4. after receiving the ready message ``N1`` marks the change as - complete and shuts down ``A`` - - 5. the ``leader`` sees the migration is complete and updates the partition table - - 6. all nodes eventually see the new partitioning and use ``N2`` - - -Transitions -^^^^^^^^^^^ - -There are transition times in the handoff process where different approaches can -be used to give different guarantees. - - -Migration Transition -~~~~~~~~~~~~~~~~~~~~ - -The first transition starts when ``N1`` initiates the moving of ``A`` and ends -when ``N1`` receives the ready message, and is referred to as the *migration -transition*. - -The first question is; during the migration transition, should: - -- ``N1`` continue to process messages for ``A``? - -- Or is it important that no messages for ``A`` are processed on - ``N1`` once migration begins? - -If it is okay for the previous host node ``N1`` to process messages during -migration then there is nothing that needs to be done at this point. - -If no messages are to be processed on the previous host node during migration -then there are two possibilities: the messages are forwarded to the new host and -buffered until the actor is ready, or the messages are simply dropped by -terminating the actor and allowing the normal dead letter process to be used. - - -Update Transition -~~~~~~~~~~~~~~~~~ - -The second transition begins when the migration is marked as complete and ends -when all nodes have the updated partition table (when all nodes will use ``N2`` -as the host for ``A``, i.e. we have convergence) and is referred to as the -*update transition*. - -Once the update transition begins ``N1`` can forward any messages it receives -for ``A`` to the new host ``N2``. The question is whether or not message -ordering needs to be preserved. If messages sent to the previous host node -``N1`` are being forwarded, then it is possible that a message sent to ``N1`` -could be forwarded after a direct message to the new host ``N2``, breaking -message ordering from a client to actor ``A``. - -In this situation ``N2`` can keep a buffer for messages per sending node. Each -buffer is flushed and removed when an acknowledgement (``ack``) message has been -received. When each node in the cluster sees the partition update it first sends -an ``ack`` message to the previous host node ``N1`` before beginning to use -``N2`` as the new host for ``A``. Any messages sent from the client node -directly to ``N2`` will be buffered. ``N1`` can count down the number of acks to -determine when no more forwarding is needed. The ``ack`` message from any node -will always follow any other messages sent to ``N1``. When ``N1`` receives the -``ack`` message it also forwards it to ``N2`` and again this ``ack`` message -will follow any other messages already forwarded for ``A``. When ``N2`` receives -an ``ack`` message, the buffer for the sending node can be flushed and removed. -Any subsequent messages from this sending node can be queued normally. Once all -nodes in the cluster have acknowledged the partition change and ``N2`` has -cleared all buffers, the handoff is complete and message ordering has been -preserved. In practice the buffers should remain small as it is only those -messages sent directly to ``N2`` before the acknowledgement has been forwarded -that will be buffered. - - -Graceful Handoff -^^^^^^^^^^^^^^^^ - -A more complete process for graceful handoff would be: - - 1. the ``leader`` sets a pending change for ``N1`` to handoff ``A`` to ``N2`` - - - 2. ``N1`` notices the pending change and sends an initialization message to - ``N2``. Options: - - a. keep ``A`` on ``N1`` active and continuing processing messages as normal - - b. ``N1`` forwards all messages for ``A`` to ``N2`` - - c. ``N1`` drops all messages for ``A`` (terminate ``A`` with messages - becoming dead letters) - - - 3. in response ``N2`` creates ``A`` and sends back a ready message. Options: - - a. ``N2`` simply processes messages for ``A`` as normal - - b. ``N2`` creates a buffer per sending node for ``A``. Each buffer is - opened (flushed and removed) when an acknowledgement for the sending - node has been received (via ``N1``) - - - 4. after receiving the ready message ``N1`` marks the change as complete. Options: - - a. ``N1`` forwards all messages for ``A`` to ``N2`` during the update transition - - b. ``N1`` drops all messages for ``A`` (terminate ``A`` with messages - becoming dead letters) - - - 5. the ``leader`` sees the migration is complete and updates the partition table - - - 6. all nodes eventually see the new partitioning and use ``N2`` - - i. each node sends an acknowledgement message to ``N1`` - - ii. when ``N1`` receives the acknowledgement it can count down the pending - acknowledgements and remove forwarding when complete - - iii. when ``N2`` receives the acknowledgement it can open the buffer for the - sending node (if buffers are used) - - -The default approach is to take options 2a, 3a, and 4a - allowing ``A`` on -``N1`` to continue processing messages during migration and then forwarding any -messages during the update transition. This assumes stateless actors that do not -have a dependency on message ordering from any given source. - -- If an actor has persistent (durable) state then nothing needs to be done, - other than migrating the actor. - -- If message ordering needs to be maintained during the update transition then - option 3b can be used, creating buffers per sending node. - -- If the actors are robust to message send failures then the dropping messages - approach can be used (with no forwarding or buffering needed). - -- If an actor is a singleton (only one instance possible throughout the cluster) - and state is transferred during the migration initialization, then options 2b - and 3b would be required. - - -Stateful Actor Replication :ref:`[*] ` -=========================================== - -.. note:: Stateful actor replication is not implemented yet. - -.. _niy: - -[*] Not Implemented Yet -======================= - -* Actor partitioning -* Actor handoff -* Actor rebalancing -* Stateful actor replication