Merge pull request #22213 from akka/wip-17963-sharding-ddata-default-patriknw

Use ddata mode as the default for Cluster Sharding, #17963
This commit is contained in:
Patrik Nordwall 2017-01-25 12:33:22 +01:00 committed by GitHub
commit a94cf75e2e
7 changed files with 145 additions and 70 deletions

View file

@ -170,9 +170,8 @@ must be to begin the rebalancing. This strategy can be replaced by an applicatio
implementation.
The state of shard locations in the ``ShardCoordinator`` is persistent (durable) with
:ref:`persistence-scala` to survive failures. Since it is running in a cluster :ref:`persistence-scala`
must be configured with a distributed journal. When a crashed or unreachable coordinator
node has been removed (via down) from the cluster a new ``ShardCoordinator`` singleton
:ref:`distributed_data_scala` or :ref:`persistence-scala` to survive failures. When a crashed or
unreachable coordinator node has been removed (via down) from the cluster a new ``ShardCoordinator`` singleton
actor will take over and the state is recovered. During such a failure period shards
with known location are still available, while messages for new (unknown) shards
are buffered until the new ``ShardCoordinator`` becomes available.
@ -188,21 +187,39 @@ unused shards due to the round-trip to the coordinator. Rebalancing of shards ma
also add latency. This should be considered when designing the application specific
shard resolution, e.g. to avoid too fine grained shards.
.. _cluster_sharding_ddata_scala:
.. _cluster_sharding_mode_scala:
Distributed Data vs. Persistence Mode
-------------------------------------
The state of the coordinator and the state of :ref:`cluster_sharding_remembering_scala` of the shards
are persistent (durable) to survive failures. :ref:`distributed_data_scala` or :ref:`persistence-scala`
can be used for the storage. Distributed Data is used by default.
The functionality when using the two modes is the same. If your sharded entities are not using Akka Persistence
themselves it is more convenient to use the Distributed Data mode, since then you don't have to
setup and operate a separate data store (e.g. Cassandra) for persistence. Aside from that, there are
no major reasons for using one mode over the the other.
It's important to use the same mode on all nodes in the cluster, i.e. it's not possible to perform
a rolling upgrade to change this setting.
Distributed Data Mode
---------------------
^^^^^^^^^^^^^^^^^^^^^
Instead of using :ref:`persistence-scala` it is possible to use the :ref:`distributed_data_scala` module
as storage for the state of the sharding coordinator. In such case the state of the
``ShardCoordinator`` will be replicated inside a cluster by the :ref:`distributed_data_scala` module with
``WriteMajority``/``ReadMajority`` consistency.
This mode is enabled with configuration (enabled by default)::
This mode can be enabled by setting configuration property::
akka.cluster.sharding.state-store-mode = ddata
akka.cluster.sharding.state-store-mode = ddata
The state of the ``ShardCoordinator`` will be replicated inside a cluster by the
:ref:`distributed_data_scala` module with ``WriteMajority``/``ReadMajority`` consistency.
The state of the coordinator is not durable, it's not stored to disk. When all nodes in
the cluster have been stopped the state is lost and not needed any more.
It is using its own Distributed Data ``Replicator`` per node role. In this way you can use a subset of
The state of :ref:`cluster_sharding_remembering_scala` is also durable, i.e. it is stored to
disk. The stored entities are started also after a complete cluster restart.
Cluster Sharding is using its own Distributed Data ``Replicator`` per node role. In this way you can use a subset of
all nodes for some entity types and another subset for other entity types. Each such replicator has a name
that contains the node role and therefore the role configuration must be the same on all nodes in the
cluster, i.e. you can't change the roles when performing a rolling upgrade.
@ -211,14 +228,18 @@ The settings for Distributed Data is configured in the the section
``akka.cluster.sharding.distributed-data``. It's not possible to have different
``distributed-data`` settings for different sharding entity types.
You must explicitly add the ``akka-distributed-data`` dependency to your build if
you use this mode. It is possible to remove ``akka-persistence`` dependency from a project if it
is not used in user code.
Persistence Mode
^^^^^^^^^^^^^^^^
.. warning::
This mode is enabled with configuration::
akka.cluster.sharding.state-store-mode = persistence
Since it is running in a cluster :ref:`persistence-scala` must be configured with a distributed journal.
You must explicitly add the ``akka-persistence`` dependency to your build if
you use this mode.
The ``ddata`` mode is considered as **“experimental”** as of its introduction in Akka 2.4.0, since
it depends on the experimental Distributed Data module.
Startup after minimum number of members
---------------------------------------
@ -253,6 +274,8 @@ then supposed to stop itself. Incoming messages will be buffered by the ``Shard`
between reception of ``Passivate`` and termination of the entity. Such buffered messages
are thereafter delivered to a new incarnation of the entity.
.. _cluster_sharding_remembering_scala:
Remembering Entities
--------------------
@ -265,7 +288,7 @@ a ``Passivate`` message must be sent to the parent of the entity actor, otherwis
entity will be automatically restarted after the entity restart backoff specified in
the configuration.
When :ref:`cluster_sharding_ddata_scala` is used the identifiers of the entities are
When :ref:`Distributed Data mode <cluster_sharding_mode_scala>` is used the identifiers of the entities are
stored in :ref:`ddata_durable_scala` of Distributed Data. You may want to change the
configuration of the akka.cluster.sharding.distributed-data.durable.lmdb.dir`, since
the default directory contains the remote port of the actor system. If using a dynamically