52. Sandboxes in Cluster
There are three sandbox types in total - shared sandboxes, and partitioned and local sandboxes (introduced in 3.0) which are vital for parallel data processing.
Shared Sandbox
This type of sandbox must be used for all data which is supposed to be accessible on all Cluster nodes. This includes all graphs, jobflows, metadata, connections, classes and input/output data for graphs which should support high availability (HA). All shared sandboxes reside in the directory, which must be properly shared among all Cluster nodes. You can use a suitable sharing/replicating tool according to the operating system and filesystem.
As you can see in the screenshot above, you can specify the root path on the filesystem and you can use placeholders or absolute path.
Placeholders available are environment variables, system properties or CloverDX Server configuration property intended for this use: sandboxes.home
.
Default path is set as [user.data.home]/CloverDX/sandboxes/[sandboxID]
where the sandboxID
is an ID specified by the user.
The user.data.home
placeholder refers to the home directory of the user running the JVM process (/home
subdirectory on Unix-like OS); it is determined as the first writable directory selected from the following values:
-
USERPROFILE
environment variable on Windows OS -
user.home
system property (user home directory) -
user.dir
system property (JVM process working directory) -
java.io.tmpdir
system property (JVM process temporary directory)
Note that the path must be valid on all Cluster nodes; not just nodes currently connected to the Cluster, but also on nodes that may be connected later. Thus when the placeholders are resolved on a node, the path must exist on the node and it must be readable/writable for the JVM process.
Local Sandbox
This sandbox type is intended for data, which is accessible only by certain Cluster nodes. It may include massive input/output files. The purpose being, that any Cluster node may access content of this type of sandbox, but only one has local (fast) access and this node must be up and running to provide data. The graph may use resources from multiple sandboxes which are physically stored on different nodes since Cluster nodes can create network streams transparently as if the resources were a local file. For details, see Using a Sandbox Resource as a Component Data Source.
Do not use a local sandbox for common project data (graphs, metadata, connections, lookups, properties files, etc.), as it can cause odd behavior. Use shared sandboxes instead.
The sandbox location path is pre-filled with the sandboxes.home.local
placeholder which, by default, points to [user.data.home]/CloverDX/sandboxes-local
.
The placeholder can be configured as any other CloverDX configuration property.
Partitioned Sandbox
This type of sandbox is an abstract wrapper for physical locations existing typically on different Cluster nodes. However, there may be multiple locations on the same node. A partitioned sandbox has two purposes related to parallel data processing:
-
node allocation specification
Locations of a partitioned sandbox define the workers which will run the graph or its parts. Each physical location causes a single worker to run without the need to store any data on its location. In other words, it tells the CloverDX Server: to execute this part of the graph in parallel on these nodes.
-
storage for part of the data
During parallel data processing, each physical location contains only part of the data. Typically, input data is split in more input files, so each file is put into a different location and each worker processes its own file.
As you can see on the screenshot above, for a partitioned sandbox, you can specify one or more physical locations on different Cluster nodes.
The sandbox location path is pre-filled with the sandboxes.home.partitioned
placeholder which, by default, points to [user.data.home]/CloverDX/sandboxes-paritioned
.
The sandboxes.home.partitioned
config property may be configured as any other CloverDX Server configuration property.
Note that the directory must be readable/writable for the user running JVM process.
Do not use a partitioned sandbox for common project data (graphs, metadata, connections, lookups, properties files, etc.), as it can cause odd behavior. Use shared sandboxes instead.