Merge pull request #17881 from akka/wip-16800-niy-patriknw

=clu #16800 Remove NIY sections of Cluster Specification
2015-07-02 20:00:01 +02:00 · 2015-07-02 20:00:01 +02:00 · 89046f7f18
commit 89046f7f18
parent 16620d9f83 7d146e6fc3
1 changed files with 4 additions and 315 deletions
--- a/akka-docs/rst/common/cluster.rst
+++ b/akka-docs/rst/common/cluster.rst
@ -5,15 +5,6 @@
 ######################

 .. note:: This document describes the design concepts of the clustering.
-   It is divided into two parts, where the first part describes what is
-   currently implemented and the second part describes what is planned as
-   future enhancements/additions. References to unimplemented parts have
-   been marked with the footnote :ref:`[*] <niy>`
-
-
-The Current Cluster
-*******************
-

 Intro
 =====
@ -34,9 +25,8 @@ Terms
  A set of nodes joined together through the `membership`_ service.

 **leader**
-  A single node in the cluster that acts as the leader. Managing cluster convergence,
-  partitions :ref:`[*] <niy>`, fail-over :ref:`[*] <niy>`, rebalancing :ref:`[*] <niy>`
-  etc.
+  A single node in the cluster that acts as the leader. Managing cluster convergence
+  and membership state transitions.


 Membership
@ -44,8 +34,8 @@ Membership

 A cluster is made up of a set of member nodes. The identifier for each node is a
 ``hostname:port:uid`` tuple. An Akka application can be distributed over a cluster with
-each node hosting some part of the application. Cluster membership and partitioning
-:ref:`[*] <niy>` of the application are decoupled. A node could be a member of a
+each node hosting some part of the application. Cluster membership and the actors running
+on that node of the application are decoupled. A node could be a member of a
 cluster without hosting any actors. Joining a cluster is initiated
 by issuing a ``Join`` command to one of the nodes in the cluster to join.

@ -349,304 +339,3 @@ Failure Detection and Unreachability
    again and thereby remove the flag


-Future Cluster Enhancements and Additions
-*****************************************
-
-
-Goal
-====
-
-In addition to membership also provide automatic partitioning :ref:`[*] <niy>`,
-handoff :ref:`[*] <niy>`, and cluster rebalancing :ref:`[*] <niy>` of actors.
-
-
-Additional Terms
-================
-
-These additional terms are used in this section.
-
-**partition** :ref:`[*] <niy>`
-  An actor or subtree of actors in the Akka application that is distributed
-  within the cluster.
-
-**partition point** :ref:`[*] <niy>`
-  The actor at the head of a partition. The point around which a partition is
-  formed.
-
-**partition path** :ref:`[*] <niy>`
-  Also referred to as the actor address. Has the format `actor1/actor2/actor3`
-
-**instance count** :ref:`[*] <niy>`
-  The number of instances of a partition in the cluster. Also referred to as the
-  ``N-value`` of the partition.
-
-**instance node** :ref:`[*] <niy>`
-  A node that an actor instance is assigned to.
-
-**partition table** :ref:`[*] <niy>`
-  A mapping from partition path to a set of instance nodes (where the nodes are
-  referred to by the ordinal position given the nodes in sorted order).
-
-
-Partitioning :ref:`[*] <niy>`
-=============================
-
-.. note:: Actor partitioning is not implemented yet.
-
-Each partition (an actor or actor subtree) in the actor system is assigned to a
-set of nodes in the cluster. The actor at the head of the partition is referred
-to as the partition point. The mapping from partition path (actor address of the
-format "a/b/c") to instance nodes is stored in the partition table and is
-maintained as part of the cluster state through the gossip protocol. The
-partition table is only updated by the ``leader`` node. Currently the only possible
-partition points are *routed* actors.
-
-Routed actors can have an instance count greater than one. The instance count is
-also referred to as the ``N-value``. If the ``N-value`` is greater than one then
-a set of instance nodes will be given in the partition table.
-
-Note that in the first implementation there may be a restriction such that only
-top-level partitions are possible (the highest possible partition points are
-used and sub-partitioning is not allowed). Still to be explored in more detail.
-
-The cluster ``leader`` determines the current instance count for a partition based
-on two axes: fault-tolerance and scaling.
-
-Fault-tolerance determines a minimum number of instances for a routed actor
-(allowing N-1 nodes to crash while still maintaining at least one running actor
-instance). The user can specify a function from current number of nodes to the
-number of acceptable node failures: n: Int => f: Int where f < n.
-
-Scaling reflects the number of instances needed to maintain good throughput and
-is influenced by metrics from the system, particularly a history of mailbox
-size, CPU load, and GC percentages. It may also be possible to accept scaling
-hints from the user that indicate expected load.
-
-The balancing of partitions can be determined in a very simple way in the first
-implementation, where the overlap of partitions is minimized. Partitions are
-spread over the cluster ring in a circular fashion, with each instance node in
-the first available space. For example, given a cluster with ten nodes and three
-partitions, A, B, and C, having N-values of 4, 3, and 5; partition A would have
-instances on nodes 1-4; partition B would have instances on nodes 5-7; partition
-C would have instances on nodes 8-10 and 1-2. The only overlap is on nodes 1 and
-2.
-
-The distribution of partitions is not limited, however, to having instances on
-adjacent nodes in the sorted ring order. Each instance can be assigned to any
-node and the more advanced load balancing algorithms will make use of this. The
-partition table contains a mapping from path to instance nodes. The partitioning
-for the above example would be::
-
-   A -> { 1, 2, 3, 4 }
-   B -> { 5, 6, 7 }
-   C -> { 8, 9, 10, 1, 2 }
-
-If 5 new nodes join the cluster and in sorted order these nodes appear after the
-current nodes 2, 4, 5, 7, and 8, then the partition table could be updated to
-the following, with all instances on the same physical nodes as before::
-
-   A -> { 1, 2, 4, 5 }
-   B -> { 7, 9, 10 }
-   C -> { 12, 14, 15, 1, 2 }
-
-When rebalancing is required the ``leader`` will schedule handoffs, gossiping a set
-of pending changes, and when each change is complete the ``leader`` will update the
-partition table.
-
-
-Additional Leader Responsibilities
----------------------------------
-
-After moving a member from joining to up, the leader can start assigning partitions
-:ref:`[*] <niy>` to the new node, and when a node is ``leaving`` the ``leader`` will
-reassign partitions :ref:`[*] <niy>` across the cluster (it is possible for a leaving
-node to itself be the ``leader``). When all partition handoff :ref:`[*] <niy>` has
-completed then the node will change to the ``exiting`` state.
-
-On convergence the leader can schedule rebalancing across the cluster,
-but it may also be possible for the user to explicitly rebalance the
-cluster by specifying migrations :ref:`[*] <niy>`, or to rebalance :ref:`[*] <niy>`
-the cluster automatically based on metrics from member nodes. Metrics may be spread
-using the gossip protocol or possibly more efficiently using a *random chord* method,
-where the ``leader`` contacts several random nodes around the cluster ring and each
-contacted node gathers information from their immediate neighbours, giving a random
-sampling of load information.
-
-
-Handoff
-------
-
-Handoff for an actor-based system is different than for a data-based system. The
-most important point is that message ordering (from a given node to a given
-actor instance) may need to be maintained. If an actor is a singleton actor
-(only one instance possible throughout the cluster) then the cluster may also
-need to assure that there is only one such actor active at any one time. Both of
-these situations can be handled by forwarding and buffering messages during
-transitions.
-
-A *graceful handoff* (one where the previous host node is up and running during
-the handoff), given a previous host node ``N1``, a new host node ``N2``, and an
-actor partition ``A`` to be migrated from ``N1`` to ``N2``, has this general
-structure:
-
-  1. the ``leader`` sets a pending change for ``N1`` to handoff ``A`` to ``N2``
-
-  2. ``N1`` notices the pending change and sends an initialization message to ``N2``
-
-  3. in response ``N2`` creates ``A`` and sends back a ready message
-
-  4. after receiving the ready message ``N1`` marks the change as
-     complete and shuts down ``A``
-
-  5. the ``leader`` sees the migration is complete and updates the partition table
-
-  6. all nodes eventually see the new partitioning and use ``N2``
-
-
-Transitions
-^^^^^^^^^^^
-
-There are transition times in the handoff process where different approaches can
-be used to give different guarantees.
-
-
-Migration Transition
-~~~~~~~~~~~~~~~~~~~~
-
-The first transition starts when ``N1`` initiates the moving of ``A`` and ends
-when ``N1`` receives the ready message, and is referred to as the *migration
-transition*.
-
-The first question is; during the migration transition, should:
-
- ``N1`` continue to process messages for ``A``?
-
- Or is it important that no messages for ``A`` are processed on
-  ``N1`` once migration begins?
-
-If it is okay for the previous host node ``N1`` to process messages during
-migration then there is nothing that needs to be done at this point.
-
-If no messages are to be processed on the previous host node during migration
-then there are two possibilities: the messages are forwarded to the new host and
-buffered until the actor is ready, or the messages are simply dropped by
-terminating the actor and allowing the normal dead letter process to be used.
-
-
-Update Transition
-~~~~~~~~~~~~~~~~~
-
-The second transition begins when the migration is marked as complete and ends
-when all nodes have the updated partition table (when all nodes will use ``N2``
-as the host for ``A``, i.e. we have convergence) and is referred to as the
-*update transition*.
-
-Once the update transition begins ``N1`` can forward any messages it receives
-for ``A`` to the new host ``N2``. The question is whether or not message
-ordering needs to be preserved. If messages sent to the previous host node
-``N1`` are being forwarded, then it is possible that a message sent to ``N1``
-could be forwarded after a direct message to the new host ``N2``, breaking
-message ordering from a client to actor ``A``.
-
-In this situation ``N2`` can keep a buffer for messages per sending node. Each
-buffer is flushed and removed when an acknowledgement (``ack``) message has been
-received. When each node in the cluster sees the partition update it first sends
-an ``ack`` message to the previous host node ``N1`` before beginning to use
-``N2`` as the new host for ``A``. Any messages sent from the client node
-directly to ``N2`` will be buffered. ``N1`` can count down the number of acks to
-determine when no more forwarding is needed. The ``ack`` message from any node
-will always follow any other messages sent to ``N1``. When ``N1`` receives the
-``ack`` message it also forwards it to ``N2`` and again this ``ack`` message
-will follow any other messages already forwarded for ``A``. When ``N2`` receives
-an ``ack`` message, the buffer for the sending node can be flushed and removed.
-Any subsequent messages from this sending node can be queued normally. Once all
-nodes in the cluster have acknowledged the partition change and ``N2`` has
-cleared all buffers, the handoff is complete and message ordering has been
-preserved. In practice the buffers should remain small as it is only those
-messages sent directly to ``N2`` before the acknowledgement has been forwarded
-that will be buffered.
-
-
-Graceful Handoff
-^^^^^^^^^^^^^^^^
-
-A more complete process for graceful handoff would be:
-
-  1. the ``leader`` sets a pending change for ``N1`` to handoff ``A`` to ``N2``
-
-
-  2. ``N1`` notices the pending change and sends an initialization message to
-     ``N2``. Options:
-
-     a. keep ``A`` on ``N1`` active and continuing processing messages as normal
-
-     b. ``N1`` forwards all messages for ``A`` to ``N2``
-
-     c. ``N1`` drops all messages for ``A`` (terminate ``A`` with messages
-        becoming dead letters)
-
-
-  3. in response ``N2`` creates ``A`` and sends back a ready message. Options:
-
-     a. ``N2`` simply processes messages for ``A`` as normal
-
-     b. ``N2`` creates a buffer per sending node for ``A``. Each buffer is
-        opened (flushed and removed) when an acknowledgement for the sending
-        node has been received (via ``N1``)
-
-
-  4. after receiving the ready message ``N1`` marks the change as complete. Options:
-
-     a. ``N1`` forwards all messages for ``A`` to ``N2`` during the update transition
-
-     b. ``N1`` drops all messages for ``A`` (terminate ``A`` with messages
-        becoming dead letters)
-
-
-  5. the ``leader`` sees the migration is complete and updates the partition table
-
-
-  6. all nodes eventually see the new partitioning and use ``N2``
-
-     i. each node sends an acknowledgement message to ``N1``
-
-     ii. when ``N1`` receives the acknowledgement it can count down the pending
-         acknowledgements and remove forwarding when complete
-
-     iii. when ``N2`` receives the acknowledgement it can open the buffer for the
-          sending node (if buffers are used)
-
-
-The default approach is to take options 2a, 3a, and 4a - allowing ``A`` on
-``N1`` to continue processing messages during migration and then forwarding any
-messages during the update transition. This assumes stateless actors that do not
-have a dependency on message ordering from any given source.
-
- If an actor has persistent (durable) state then nothing needs to be done,
-  other than migrating the actor.
-
- If message ordering needs to be maintained during the update transition then
-  option 3b can be used, creating buffers per sending node.
-
- If the actors are robust to message send failures then the dropping messages
-  approach can be used (with no forwarding or buffering needed).
-
- If an actor is a singleton (only one instance possible throughout the cluster)
-  and state is transferred during the migration initialization, then options 2b
-  and 3b would be required.
-
-
-Stateful Actor Replication :ref:`[*] <niy>`
-===========================================
-
-.. note:: Stateful actor replication is not implemented yet.
-
-.. _niy:
-
-[*] Not Implemented Yet
-=======================
-
-* Actor partitioning
-* Actor handoff
-* Actor rebalancing
-* Stateful actor replication