Component Allocation
This attribute is taken into account only on the CloverDX Cluster environment.
The Allocation attribute is common for all Components. This attribute is used for Cluster graph processing to plan how many instances of a component will be executed and on which Cluster nodes. Allocation is our basic concept for parallelization of data processing and inter-Cluster-node data routing.
Allocation can be specified in three different ways:
-
based on number of workers - the component will be executed in requested instances on some Cluster nodes, which are preferred by CloverDX Cluster;
-
based on a reference on a partitioned sandbox - the component will be executed on all Cluster nodes where the partitioned sandbox has a location;
This allocation type is transparently used as a default for most of data readers and data writers which refer to a file in a partitioned sandbox.
-
allocation defined by a list of Cluster node identifiers (a Cluster node can be used more times)
Allocation is automatically inherited from neighboring components. Therefore, continuous graph may have only a single component with an allocation and this allocation is used by all other components as well. All components of Clustered graphs are decorated by the number of instances (x3) in which the component will be finally executed - so called allocation cardinality. These annotations are updated on a graph save operation. Allocation cardinality derived from neighbors is indicated in gray italic font and the cardinality derived from an allocation defined right on the component is printed out with a solid font.
Two interconnected components have to have compatible allocations - the number of specified workers has to be equal. The only exception from this rule are Cluster components, which are dedicated just to change the level of parallelism. Parallel Partitioners change a single-worker allocation to multi-worker allocation. On the other hand, Parallel Gatherers change a multi-worker allocation to single-worker allocation.
For more details about Clustered graph processing, see Data Partitioning in Cluster.