pekko/akka-docs/src/main/paradox/distributed-data.md
Christopher Batey b463a1adbd
Merge master into re feature branch (#29135)
* Merge master into feature branch

* Formatting

* Remove redundant fixme

* Remove files that snuck in

* Fix backoff supervisor test
2020-05-27 12:50:53 +01:00

13 KiB

Classic Distributed Data

@@includeincludes.md { #actor-api } For the full documentation of this feature and for new projects see @ref:Distributed Data.

Dependency

To use Akka Distributed Data, you must add the following dependency in your project:

@@dependency[sbt,Maven,Gradle] { group="com.typesafe.akka" artifact="akka-distributed-data_$scala.binary.version$" version="$akka.version$" }

Introduction

For the full documentation of this feature and for new projects see @ref:Distributed Data - Introduction.

Using the Replicator

The akka.cluster.ddata.Replicator actor provides the API for interacting with the data. The Replicator actor must be started on each node in the cluster, or group of nodes tagged with a specific role. It communicates with other Replicator instances with the same path (without address) that are running on other nodes . For convenience it can be used with the akka.cluster.ddata.DistributedData extension but it can also be started as an ordinary actor using the Replicator.props. If it is started as an ordinary actor it is important that it is given the same name, started on same path, on all nodes.

Cluster members with status @ref:WeaklyUp, will participate in Distributed Data. This means that the data will be replicated to the WeaklyUp nodes with the background gossip protocol. Note that it will not participate in any actions where the consistency mode is to read/write from all nodes or the majority of nodes. The WeaklyUp node is not counted as part of the cluster. So 3 nodes + 5 WeaklyUp is essentially a 3 node cluster as far as consistent actions are concerned.

Below is an example of an actor that schedules tick messages to itself and for each tick adds or removes elements from a ORSet (observed-remove set). It also subscribes to changes of this.

Scala
@@snip DistributedDataDocSpec.scala { #data-bot }
Java
@@snip DataBot.java { #data-bot }

Update

For the full documentation of this feature and for new projects see @ref:Distributed Data - Update.

To modify and replicate a data value you send a Replicator.Update message to the local Replicator.

The current data value for the key of the Update is passed as parameter to the modify function of the Update. The function is supposed to return the new value of the data, which will then be replicated according to the given consistency level.

The modify function is called by the Replicator actor and must therefore be a pure function that only uses the data parameter and stable fields from enclosing scope. It must for example not access the sender (@scala[sender()]@java[getSender()]) reference of an enclosing actor.

Update is intended to only be sent from an actor running in same local ActorSystem as the Replicator, because the modify function is typically not serializable.

Scala
@@snip DistributedDataDocSpec.scala { #update }
Java
@@snip DistributedDataDocTest.java { #update }

As reply of the Update a Replicator.UpdateSuccess is sent to the sender of the Update if the value was successfully replicated according to the supplied @ref:write consistency level within the supplied timeout. Otherwise a Replicator.UpdateFailure subclass is sent back. Note that a Replicator.UpdateTimeout reply does not mean that the update completely failed or was rolled back. It may still have been replicated to some nodes, and will eventually be replicated to all nodes with the gossip protocol.

Scala
@@snip DistributedDataDocSpec.scala { #update-response1 }
Java
@@snip DistributedDataDocTest.java { #update-response1 }
Scala
@@snip DistributedDataDocSpec.scala { #update-response2 }
Java
@@snip DistributedDataDocTest.java { #update-response2 }

You will always see your own writes. For example if you send two Update messages changing the value of the same key, the modify function of the second message will see the change that was performed by the first Update message.

It is possible to abort the Update when inspecting the state parameter that is passed in to the modify function by throwing an exception. That happens before the update is performed and a Replicator.ModifyFailure is sent back as reply.

In the Update message you can pass an optional request context, which the Replicator does not care about, but is included in the reply messages. This is a convenient way to pass contextual information (e.g. original sender) without having to use ask or maintain local correlation data structures.

Scala
@@snip DistributedDataDocSpec.scala { #update-request-context }
Java
@@snip DistributedDataDocTest.java { #update-request-context }

Get

For the full documentation of this feature and for new projects see @ref:Distributed Data - Get.

To retrieve the current value of a data you send Replicator.Get message to the Replicator. You supply a consistency level which has the following meaning:

Scala
@@snip DistributedDataDocSpec.scala { #get }
Java
@@snip DistributedDataDocTest.java { #get }

As reply of the Get a Replicator.GetSuccess is sent to the sender of the Get if the value was successfully retrieved according to the supplied @ref:read consistency level within the supplied timeout. Otherwise a Replicator.GetFailure is sent. If the key does not exist the reply will be Replicator.NotFound.

Scala
@@snip DistributedDataDocSpec.scala { #get-response1 }
Java
@@snip DistributedDataDocTest.java { #get-response1 }
Scala
@@snip DistributedDataDocSpec.scala { #get-response2 }
Java
@@snip DistributedDataDocTest.java { #get-response2 }

In the Get message you can pass an optional request context in the same way as for the Update message, described above. For example the original sender can be passed and replied to after receiving and transforming GetSuccess.

Scala
@@snip DistributedDataDocSpec.scala { #get-request-context }
Java
@@snip DistributedDataDocTest.java { #get-request-context }

Subscribe

For the full documentation of this feature and for new projects see @ref:Distributed Data - Subscribe.

You may also register interest in change notifications by sending Replicator.Subscribe message to the Replicator. It will send Replicator.Changed messages to the registered subscriber when the data for the subscribed key is updated. Subscribers will be notified periodically with the configured notify-subscribers-interval, and it is also possible to send an explicit Replicator.FlushChanges message to the Replicator to notify the subscribers immediately.

The subscriber is automatically removed if the subscriber is terminated. A subscriber can also be deregistered with the Replicator.Unsubscribe message.

Scala
@@snip DistributedDataDocSpec.scala { #subscribe }
Java
@@snip DistributedDataDocTest.java { #subscribe }

Consistency

For the full documentation of this feature and for new projects see @ref:Distributed Data Consistency.

Here is an example of using WriteMajority and ReadMajority:

Scala
@@snip ShoppingCart.scala { #read-write-majority }
Java
@@snip ShoppingCart.java { #read-write-majority }
Scala
@@snip ShoppingCart.scala { #get-cart }
Java
@@snip ShoppingCart.java { #get-cart }
Scala
@@snip ShoppingCart.scala { #add-item }
Java
@@snip ShoppingCart.java { #add-item }

In some rare cases, when performing an Update it is needed to first try to fetch latest data from other nodes. That can be done by first sending a Get with ReadMajority and then continue with the Update when the GetSuccess, GetFailure or NotFound reply is received. This might be needed when you need to base a decision on latest information or when removing entries from an ORSet or ORMap. If an entry is added to an ORSet or ORMap from one node and removed from another node the entry will only be removed if the added entry is visible on the node where the removal is performed (hence the name observed-removed set).

The following example illustrates how to do that:

Scala
@@snip ShoppingCart.scala { #remove-item }
Java
@@snip ShoppingCart.java { #remove-item }

@@@ warning

Caveat: Even if you use WriteMajority and ReadMajority there is small risk that you may read stale data if the cluster membership has changed between the Update and the Get. For example, in cluster of 5 nodes when you Update and that change is written to 3 nodes: n1, n2, n3. Then 2 more nodes are added and a Get request is reading from 4 nodes, which happens to be n4, n5, n6, n7, i.e. the value on n1, n2, n3 is not seen in the response of the Get request.

@@@

Delete

For the full documentation of this feature and for new projects see @ref:Distributed Data - Delete.

Scala
@@snip DistributedDataDocSpec.scala { #delete }
Java
@@snip DistributedDataDocTest.java { #delete }

@@@ warning

As deleted keys continue to be included in the stored data on each node as well as in gossip messages, a continuous series of updates and deletes of top-level entities will result in growing memory usage until an ActorSystem runs out of memory. To use Akka Distributed Data where frequent adds and removes are required, you should use a fixed number of top-level data types that support both updates and removals, for example ORMap or ORSet.

@@@

Replicated data types

Akka contains a set of useful replicated data types and it is fully possible to implement custom replicated data types. For the full documentation of this feature and for new projects see @ref:Distributed Data Replicated data types.

Delta-CRDT

For the full documentation of this feature and for new projects see @ref:Distributed Data Delta CRDT.

Custom Data Type

You can implement your own data types. For the full documentation of this feature and for new projects see @ref:Distributed Data custom data type.

Durable Storage

For the full documentation of this feature and for new projects see @ref:Durable Storage.

Limitations

For the full documentation of this feature and for new projects see @ref:Limitations.

Learn More about CRDTs

Configuration

The DistributedData extension can be configured with the following properties:

@@snip reference.conf { #distributed-data }