Chapter 50. Sandboxes in Cluster
There are three sandbox types in total - shared sandboxes, and partitioned and local sandboxes (introduced in 3.0) which are vital for parallel data processing.
Local Sandbox
This sandbox type is intended for data, which is accessible only by certain Cluster nodes. It may include massive input/output files. The purpose being, that any Cluster node may access content of this type of sandbox, but only one has local (fast) access and this node must be up and running to provide data. The graph may use resources from multiple sandboxes which are physically stored on different nodes since Cluster nodes can create network streams transparently as if the resources were a local file. For details, see Using a Sandbox Resource as a Component Data Source.
Do not use a local sandbox for common project data (graphs, metadata, connections, lookups, properties files, etc.), as it can cause odd behavior. Use shared sandboxes instead.
Figure 50.2. Dialog form for creating a new local sandbox
The sandbox location path is pre-filled with the sandboxes.home.local
placeholder
which, by default, points to [user.data.home]/CloverDX/sandboxes-local
.
The placeholder can be configured as any other CloverDX configuration property.
Partitioned Sandbox
This type of sandbox is an abstract wrapper for physical locations existing typically on different Cluster nodes. However, there may be multiple locations on the same node. A partitioned sandbox has two purposes related to parallel data processing:
node allocation specification
Locations of a partitioned sandbox define the workers which will run the graph or its parts. Each physical location causes a single worker to run without the need to store any data on its location. In other words, it tells the CloverDX Server: to execute this part of the graph in parallel on these nodes.
storage for part of the data
During parallel data processing, each physical location contains only part of the data. Typically, input data is split in more input files, so each file is put into a different location and each worker processes its own file.
Figure 50.3. Dialog form for creating a new partitioned sandbox
As you can see on the screenshot above, for a partitioned sandbox, you can specify one or more physical locations on different Cluster nodes.
The sandbox location path is pre-filled with the sandboxes.home.partitioned
placeholder
which, by default, points to [user.data.home]/CloverDX/sandboxes-paritioned
.
The sandboxes.home.partitioned
config property may be configured
as any other CloverDX Server configuration property.
Note that the directory must be readable/writable for the user running JVM process.
Do not use a partitioned sandbox for common project data (graphs, metadata, connections, lookups, properties files, etc.), as it can cause odd behavior. Use shared sandboxes instead.