Get rst docs building again and add some adjustments to the new cluster documentation

This commit is contained in:
Peter Vlugter 2011-11-09 10:40:22 +01:00
parent 294c71d9cb
commit d04ad32588
7 changed files with 77 additions and 47 deletions

View file

@ -5,6 +5,8 @@
Cluster
#########
*This document describes the new clustering coming in Akka 2.1*
Intro
=====
@ -41,16 +43,16 @@ These terms are used throughout the documentation.
**partition path**
Also referred to as the actor address. Has the format `actor1/actor2/actor3`
**base node**
The first node (with nodes in sorted order) that contains a particular partition.
**instance count**
The number of instances of a partition in the cluster. Also referred to as the
``N-value`` of the partition.
**instance node**
A node that an actor instance is assigned to.
**partition table**
A mapping from partition path to base node and its ``N-value`` (instance
count).
A mapping from partition path to a set of instance nodes (where the nodes are
referred to by the ordinal position given the nodes in sorted order).
Membership
@ -154,7 +156,11 @@ The role of the leader is to shift members in and out of the cluster, changing
leader actions are only triggered by receiving a new cluster state with gossip
convergence but it may also be possible for the user to explicitly rebalance the
cluster by specifying migrations, or to rebalance the cluster automatically
based on metrics gossiped by the member nodes.
based on metrics from member nodes. Metrics may be spread using the gossip
protocol or possibly more efficiently using a *random chord* method, where the
leader contacts several random nodes around the cluster ring and each contacted
node gathers information from their immediate neighbours, giving a random
sampling of load information.
The leader also has the power, if configured so, to "auto-down" a node that
according to the Failure Detector is considered unreachable. This means setting
@ -351,17 +357,16 @@ Partitioning
============
Each partition (an actor or actor subtree) in the actor system is assigned to a
base node. The actor at the head of the partition is referred to as the
partition point. The mapping from partition path (actor address of the format
"a/b/c") to base node is stored in the partition table and is maintained as part
of the cluster state through the gossip protocol. The partition table is only
updated by the leader node. Currently the only possible partition points are
*routed* actors.
set of nodes in the cluster. The actor at the head of the partition is referred
to as the partition point. The mapping from partition path (actor address of the
format "a/b/c") to instance nodes is stored in the partition table and is
maintained as part of the cluster state through the gossip protocol. The
partition table is only updated by the leader node. Currently the only possible
partition points are *routed* actors.
Routed actors can have an instance count greater than one. The instance count is
also referred to as the ``N-value``. If the ``N-value`` is greater than one then
the first instance will be found on the base node, and the other instances on
the next N-1 nodes, given the nodes in sorted order.
a set of instance nodes will be given in the partition table.
Note that in the first implementation there may be a restriction such that only
top-level partitions are possible (the highest possible partition points are
@ -380,16 +385,32 @@ is influenced by metrics from the system, particularly a history of mailbox
size, CPU load, and GC percentages. It may also be possible to accept scaling
hints from the user that indicate expected load.
The balancing of partitions is determined in a simple way (at least for the
first implementation) where the overlap of partitions is minimized. Partitions
are spread over the cluster ring in a circular fashion, with each base node
in the first available space.
The balancing of partitions can be determined in a very simple way in the first
implementation, where the overlap of partitions is minimized. Partitions are
spread over the cluster ring in a circular fashion, with each instance node in
the first available space. For example, given a cluster with ten nodes and three
partitions, A, B, and C, having N-values of 4, 3, and 5; partition A would have
instances on nodes 1-4; partition B would have instances on nodes 5-7; partition
C would have instances on nodes 8-10 and 1-2. The only overlap is on nodes 1 and
2.
For example, given a cluster with ten nodes and three partitions having N-values
of 4, 3, and 5; partition 1 would have base node 1 and instances on nodes
1-4; partition 2 would have base node 5 and instances on nodes 5-7; partition 3
would have base node 8 and instances on nodes 8-10 and 1-2. The only overlap is
on nodes 1 and 2.
The distribution of partitions is not limited, however, to having instances on
adjacent nodes in the sorted ring order. Each instance can be assigned to any
node and the more advanced load balancing algorithms will make use of this. The
partition table contains a mapping from path to instance nodes. The partitioning
for the above example would be::
A -> { 1, 2, 3, 4 }
B -> { 5, 6, 7 }
C -> { 8, 9, 10, 1, 2 }
If 5 new nodes join the cluster and in sorted order these nodes appear after the
current nodes 2, 4, 5, 7, and 8, then the partition table could be updated to
the following, with all instances on the same physical nodes as before::
A -> { 1, 2, 4, 5 }
B -> { 7, 9, 10 }
C -> { 12, 14, 15, 1, 2 }
When rebalancing is required the leader will schedule handoffs, gossiping a set
of pending changes, and when each change is complete the leader will update the
@ -558,6 +579,7 @@ have a dependency on message ordering from any given source.
and state is transfered during the migration initialization, then options 2b
and 3b would be required.
Stateful Actor Replication
==========================
@ -568,34 +590,50 @@ datastore as well. The stateful actor clustering should be layered on top of the
distributed datastore. See the next section for a rough outline on how the
distributed datastore could be implemented.
Implementing a Dynamo-style distributed database on top of Akka Cluster
-----------------------------------------------------------------------
The missing pieces to implement a full Dynamo-style eventually consistent data
storage on top of the Akka Cluster as described in this document are:
- Configuration of ``READ`` and ``WRITE`` consistency levels according to the ``N/R/W`` numbers
defined in the Dynamo paper.
- Configuration of ``READ`` and ``WRITE`` consistency levels according to the
``N/R/W`` numbers defined in the Dynamo paper.
- R = read replica count
- W = write replica count
- N = replication factor
- Q = QUORUM = N / 2 + 1
- W + R > N = full consistency
- Define a versioned data message wrapper: ``Versioned[T](hash: Long, version: VectorClock, data: T)``.
- Define a versioned data message wrapper::
Versioned[T](hash: Long, version: VectorClock, data: T)
- Define a single system data broker actor on each node that uses a ``Consistent
Hashing Router`` and that have instances on all other nodes in the node ring.
- For ``WRITE``:
1. Wrap data in a ``Versioned Message``
2. Send a ``Versioned Message`` with the data is sent to a number of nodes matching the ``W-value``.
2. Send a ``Versioned Message`` with the data is sent to a number of nodes
matching the ``W-value``.
- For ``READ``:
1. Read in the ``Versioned Message`` with the data from as many replicas as you need for the consistency level required by the ``R-value``.
1. Read in the ``Versioned Message`` with the data from as many replicas as
you need for the consistency level required by the ``R-value``.
2. Do comparison on the versions (using `Vector Clocks`_)
3. If the versions differ then do `Read Repair`_ to update the inconsistent nodes.
3. If the versions differ then do `Read Repair`_ to update the inconsistent
nodes.
4. Return the latest versioned data.
.. _Read Repair: http://wiki.apache.org/cassandra/ReadRepair

View file

@ -4,4 +4,5 @@ Cluster
.. toctree::
:maxdepth: 2
cluster
durable-mailbox

View file

@ -1,13 +1,13 @@
.. _guaranteed-delivery:
#########
#####################
Guaranteed Delivery
#########
#####################
Guaranteed Delivery
=====
===================
Akka does *not* support guaranteed delivery.

View file

@ -10,4 +10,3 @@ General
slf4j
supervision
guaranteed-delivery

View file

@ -605,7 +605,6 @@ Message Serialization
All messages that are sent to remote actors needs to be serialized to binary format to be able to travel over the wire to the remote node. This is done by letting your messages extend one of the traits in the 'akka.serialization.Serializable' object. If the messages don't implement any specific serialization trait then the runtime will try to use standard Java serialization.
Here is one example, but full documentation can be found in the :ref:`serialization-java`.
Protobuf
^^^^^^^^

View file

@ -60,12 +60,6 @@ Creating Actors
.. includecode:: code/ActorDocSpec.scala#creating-actorOf
Normally you would want to import the ``actorOf`` method like this:
.. includecode:: code/ActorDocSpec.scala#creating-imported
to avoid prefixing it with ``Actor`` every time you use it.
The call to ``actorOf`` returns an instance of ``ActorRef``. This is a handle to
the ``Actor`` instance which you can use to interact with the ``Actor``. The
``ActorRef`` is immutable and has a one to one relationship with the Actor it

View file

@ -666,7 +666,6 @@ Message Serialization
All messages that are sent to remote actors needs to be serialized to binary format to be able to travel over the wire to the remote node. This is done by letting your messages extend one of the traits in the 'akka.serialization.Serializable' object. If the messages don't implement any specific serialization trait then the runtime will try to use standard Java serialization.
Here are some examples, but full documentation can be found in the :ref:`serialization-scala`.
Scala JSON
^^^^^^^^^^