pekko/docs/src/main/paradox/distributed-data.md
Jonas Chapuis 3d93c29737
Renames in doc #98 (#100)
* Rename some base_url values in Paradox.scala
* Rename Akka to Pekko in documentation, also adapt/remove small blocks of text that were no longer consistent or adequate
* Rename Akka to Pekko in some related files outside the doc directory
* Rename akka-bom to pekko-bom
* Rename artifacts prefixes from akka to pekko
* Add links to Apache documents in licenses.md
* Add links to Akka migration guides for earlier versions
2023-01-18 08:13:01 +01:00

15 KiB

Classic Distributed Data

@@includeincludes.md { #actor-api } For the full documentation of this feature and for new projects see @ref:Distributed Data.

Dependency

To use Pekko Distributed Data, you must add the following dependency in your project:

@@dependency[sbt,Maven,Gradle] { bomGroup=org.apache.pekko bomArtifact=pekko-bom_scala.binary.version bomVersionSymbols=PekkoVersion symbol1=PekkoVersion value1="$pekko.version$" group="org.apache.pekko" artifact="pekko-distributed-data_$scala.binary.version$" version=PekkoVersion }

Introduction

For the full documentation of this feature and for new projects see @ref:Distributed Data - Introduction.

Using the Replicator

The @apidoc[cluster.ddata.Replicator] actor provides the API for interacting with the data. The Replicator actor must be started on each node in the cluster, or group of nodes tagged with a specific role. It communicates with other Replicator instances with the same path (without address) that are running on other nodes . For convenience it can be used with the @apidoc[cluster.ddata.DistributedData] extension but it can also be started as an ordinary actor using the @apidocReplicator.props {scala="#props(settings:org.apache.pekko.cluster.ddata.ReplicatorSettings):org.apache.pekko.actor.Props" java="#props(org.apache.pekko.cluster.ddata.ReplicatorSettings)"}. If it is started as an ordinary actor it is important that it is given the same name, started on same path, on all nodes.

Cluster members with status @ref:WeaklyUp, will participate in Distributed Data. This means that the data will be replicated to the WeaklyUp nodes with the background gossip protocol. Note that it will not participate in any actions where the consistency mode is to read/write from all nodes or the majority of nodes. The WeaklyUp node is not counted as part of the cluster. So 3 nodes + 5 WeaklyUp is essentially a 3 node cluster as far as consistent actions are concerned.

Below is an example of an actor that schedules tick messages to itself and for each tick adds or removes elements from a @apidocORSet (observed-remove set). It also subscribes to changes of this.

Scala
@@snip DistributedDataDocSpec.scala { #data-bot }
Java
@@snip DataBot.java { #data-bot }

Update

For the full documentation of this feature and for new projects see @ref:Distributed Data - Update.

To modify and replicate a data value you send a @apidocReplicator.Update message to the local @apidoc[cluster.ddata.Replicator].

The current data value for the key of the Update is passed as parameter to the @scala[@scaladocmodify]@java[@javadocmodify()] function of the Update. The function is supposed to return the new value of the data, which will then be replicated according to the given consistency level.

The modify function is called by the Replicator actor and must therefore be a pure function that only uses the data parameter and stable fields from enclosing scope. It must for example not access the sender (@scala[@scaladocsender()]@java[@javadocgetSender()]) reference of an enclosing actor.

Update is intended to only be sent from an actor running in same local @apidoc[actor.ActorSystem] as the Replicator, because the modify function is typically not serializable.

Scala
@@snip DistributedDataDocSpec.scala { #update }
Java
@@snip DistributedDataDocTest.java { #update }

As reply of the Update a @apidocReplicator.UpdateSuccess is sent to the sender of the Update if the value was successfully replicated according to the supplied @ref:write consistency level within the supplied timeout. Otherwise a @apidocReplicator.UpdateFailure subclass is sent back. Note that a @apidocReplicator.UpdateTimeout reply does not mean that the update completely failed or was rolled back. It may still have been replicated to some nodes, and will eventually be replicated to all nodes with the gossip protocol.

Scala
@@snip DistributedDataDocSpec.scala { #update-response1 }
Java
@@snip DistributedDataDocTest.java { #update-response1 }
Scala
@@snip DistributedDataDocSpec.scala { #update-response2 }
Java
@@snip DistributedDataDocTest.java { #update-response2 }

You will always see your own writes. For example if you send two @apidoc[cluster.ddata.Replicator.Update] messages changing the value of the same key, the modify function of the second message will see the change that was performed by the first Update message.

It is possible to abort the Update when inspecting the state parameter that is passed in to the modify function by throwing an exception. That happens before the update is performed and a @apidocReplicator.ModifyFailure is sent back as reply.

In the Update message you can pass an optional request context, which the @apidoc[cluster.ddata.Replicator] does not care about, but is included in the reply messages. This is a convenient way to pass contextual information (e.g. original sender) without having to use ask or maintain local correlation data structures.

Scala
@@snip DistributedDataDocSpec.scala { #update-request-context }
Java
@@snip DistributedDataDocTest.java { #update-request-context }

Get

For the full documentation of this feature and for new projects see @ref:Distributed Data - Get.

To retrieve the current value of a data you send @apidocReplicator.Get message to the Replicator. You supply a consistency level which has the following meaning:

Scala
@@snip DistributedDataDocSpec.scala { #get }
Java
@@snip DistributedDataDocTest.java { #get }

As reply of the Get a @apidocReplicator.GetSuccess is sent to the sender of the Get if the value was successfully retrieved according to the supplied @ref:read consistency level within the supplied timeout. Otherwise a @apidocReplicator.GetFailure is sent. If the key does not exist the reply will be @apidocReplicator.NotFound.

Scala
@@snip DistributedDataDocSpec.scala { #get-response1 }
Java
@@snip DistributedDataDocTest.java { #get-response1 }
Scala
@@snip DistributedDataDocSpec.scala { #get-response2 }
Java
@@snip DistributedDataDocTest.java { #get-response2 }

In the @apidoc[cluster.ddata.Replicator.Get] message you can pass an optional request context in the same way as for the @apidoc[cluster.ddata.Replicator.Update] message, described above. For example the original sender can be passed and replied to after receiving and transforming @apidoc[cluster.ddata.Replicator.GetSuccess].

Scala
@@snip DistributedDataDocSpec.scala { #get-request-context }
Java
@@snip DistributedDataDocTest.java { #get-request-context }

Subscribe

For the full documentation of this feature and for new projects see @ref:Distributed Data - Subscribe.

You may also register interest in change notifications by sending @apidocReplicator.Subscribe message to the Replicator. It will send @apidocReplicator.Changed messages to the registered subscriber when the data for the subscribed key is updated. Subscribers will be notified periodically with the configured notify-subscribers-interval, and it is also possible to send an explicit Replicator.FlushChanges message to the Replicator to notify the subscribers immediately.

The subscriber is automatically removed if the subscriber is terminated. A subscriber can also be deregistered with the @apidocReplicator.Unsubscribe message.

Scala
@@snip DistributedDataDocSpec.scala { #subscribe }
Java
@@snip DistributedDataDocTest.java { #subscribe }

Consistency

For the full documentation of this feature and for new projects see @ref:Distributed Data Consistency.

Here is an example of using @apidoc[cluster.ddata.Replicator.WriteMajority] and @apidoc[cluster.ddata.Replicator.ReadMajority]:

Scala
@@snip ShoppingCart.scala { #read-write-majority }
Java
@@snip ShoppingCart.java { #read-write-majority }
Scala
@@snip ShoppingCart.scala { #get-cart }
Java
@@snip ShoppingCart.java { #get-cart }
Scala
@@snip ShoppingCart.scala { #add-item }
Java
@@snip ShoppingCart.java { #add-item }

In some rare cases, when performing an @apidoc[cluster.ddata.Replicator.Update] it is needed to first try to fetch latest data from other nodes. That can be done by first sending a @apidoc[cluster.ddata.Replicator.Get] with @apidoc[cluster.ddata.Replicator.ReadMajority] and then continue with the @apidoc[cluster.ddata.Replicator.Update] when the @apidoc[cluster.ddata.Replicator.GetSuccess], @apidoc[cluster.ddata.Replicator.GetFailure] or @apidoc[cluster.ddata.Replicator.NotFound] reply is received. This might be needed when you need to base a decision on latest information or when removing entries from an @apidoc[cluster.ddata.ORSet] or @apidoc[cluster.ddata.ORMap]. If an entry is added to an ORSet or ORMap from one node and removed from another node the entry will only be removed if the added entry is visible on the node where the removal is performed (hence the name observed-removed set).

The following example illustrates how to do that:

Scala
@@snip ShoppingCart.scala { #remove-item }
Java
@@snip ShoppingCart.java { #remove-item }

@@@ warning

Caveat: Even if you use WriteMajority and ReadMajority there is small risk that you may read stale data if the cluster membership has changed between the Update and the Get. For example, in cluster of 5 nodes when you Update and that change is written to 3 nodes: n1, n2, n3. Then 2 more nodes are added and a Get request is reading from 4 nodes, which happens to be n4, n5, n6, n7, i.e. the value on n1, n2, n3 is not seen in the response of the Get request.

@@@

Delete

For the full documentation of this feature and for new projects see @ref:Distributed Data - Delete.

Scala
@@snip DistributedDataDocSpec.scala { #delete }
Java
@@snip DistributedDataDocTest.java { #delete }

@@@ warning

As deleted keys continue to be included in the stored data on each node as well as in gossip messages, a continuous series of updates and deletes of top-level entities will result in growing memory usage until an ActorSystem runs out of memory. To use Pekko Distributed Data where frequent adds and removes are required, you should use a fixed number of top-level data types that support both updates and removals, for example @apidoc[cluster.ddata.ORMap] or @apidoc[cluster.ddata.ORSet].

@@@

Replicated data types

Pekko contains a set of useful replicated data types and it is fully possible to implement custom replicated data types. For the full documentation of this feature and for new projects see @ref:Distributed Data Replicated data types.

Delta-CRDT

For the full documentation of this feature and for new projects see @ref:Distributed Data Delta CRDT.

Custom Data Type

You can implement your own data types. For the full documentation of this feature and for new projects see @ref:Distributed Data custom data type.

Durable Storage

For the full documentation of this feature and for new projects see @ref:Durable Storage.

Limitations

For the full documentation of this feature and for new projects see @ref:Limitations.

Learn More about CRDTs

Configuration

The @apidoc[cluster.ddata.DistributedData] extension can be configured with the following properties:

@@snip reference.conf { #distributed-data }