Chapter 61. Data Partitioning

Common Properties of Data Partitioning Components

Components from this category are primarily dedicated for data flow management when using Data Partitioning or in CloverDX Cluster environment, which provides an ability of massive parallelization of data transformation processing. Each component in a transformation graph running with data partitioning enabled or in Cluster environment can be executed in multiple instances - this is called component allocation. Component allocation specifies how many instances will be executed, and where (on which Cluster nodes) will they be running. See documentation for Data Partitioning or CloverDX Cluster for more details.

In general, data partitioning components can be divided into two sub-categories - partitioners and gatherers.

Parallel partitioners distribute data records from a single worker among various Cluster workers. Parallel partitioners are used to change a single-worker allocation to multiple-worker allocation.

Parallel gatherers collect data records from various Cluster workers to a single worker. Parallel gatherers are actually used to change a multiple-worker allocation to single-worker allocation.

Out of both basic parallel component groups stands the ParallelRepartition component.

See also
Chapter 30, Components
Common Properties of Components
Specific Attribute Types