Version

    DataSetReader

    DataSetReader 64x64

    Short description

    Ports

    Metadata

    Attributes

    Details

    Unloading data from Data Manager

    Compatibility

    See also

    Short description

    DataSetReader reads data from data sets stored in the Data Manager.

    This component must run on CloverDX Server. It only connects to the Data Manager instance deployed in the same CloverDX Server instance where it is running.

    Data source Input ports Output ports Each to all outputs Different to different outputs Transformation Transf. req. Java CTL Auto-propagated metadata

    Any data set in Data Manager

    0

    1

    Ports

    Port type Number Required Description Metadata

    Output

    0

    For the data read from the selected data set.

    Auto-propagated based on the layout of the selected data set.
    Custom metadata and Output mapping can also be used.

    Metadata

    The DataSetReader component propagates metadata on the output port. Metadata is created based on the layout of the selected data set.

    If different metadata is used, the mapping from data set’s metadata to the metadata on the output port can be done via the Output mapping component attribute.

    DataSetReader attributes

    Attribute Req Description Possible values

    Basic

    Data Set

    Data set to read from. Clicking on the select button will show all data sets available on the Server.
    Data set is identified by its code. The code is assigned to the data set when it is created and does not change when the data set is renamed.

    Record status

    Allows you to filter the records based on their status. Default value is Approved.

    All (any status)
    New
    Edited
    Approved (default)
    Committed

    Include deleted

    Configure whether to include records marked as deleted when reading from the data set.
    The default value is “false” – deleted records are not included in the data returned by the component.

    false (default)
    true

    Complete batches only

    Configure whether to include only batches that have all records in Approved or Committed status.
    If this is disabled, records are read from batches regardless of the overall batch status.
    If this is enabled, records from batch are read only if all records in given batch are approved.
    This setting cannot be used on data sets that do not have batching enabled – the component will fail in such case. Default is false (disabled).

    false (default)
    true

    Output mapping

    Allows you to map data read from the data set to the output port. By default, this is set to Map by name and fields with matching names and types will be mapped automatically. This is consistent with the common usage where the metadata on output port 0 is auto-propagated and will match the data set exactly.

    Advanced

    Max number of records

    Configure how many records to read from the data set. If the data set contains fewer matching records than specified, the component will finish once it reads all of them – it will not wait for the data set to grow. If this attributed is left empty, all records from the data set will be read.

    Details

    DataSetReader connects to an instance of Data Manager running on the same Server as the component and returns data from the selected data set. The component is intended for usage in “post-processing” jobs which read data from the Data Manager and load the data to the target system.

    Unloading data from Data Manager

    The basic pattern for reading data from Data Manager is to use DataSetReader component followed by any components implementing the logic for the data and then followed by the DataSetCommit to mark the records as fully processed.

    As an example, a simple job pulling data from the Data Manager may look like this:

    datasetreader basic usage graph
    Figure 454. A simple job that reads data from Data Manager, loads the records to the data warehouse and then informs the Data Manager that those records have been fully processed.

    DataSetReader must be used together with DataSetCommit component to mark the records processed with the reader as Committed. If this is not done, the records will stay in Approved status and will never be purged from the Data Manager.

    The job above first reads data from the specified data set with the DataSetReader, then loads the records to the warehouse and finally notifies the Data Manager that the records were successfully processed by setting their status to Committed with the DataSetCommit.

    Note how the DataSetCommit is in phase 5 while the DataSetReader and WriteListingToDWH are both in phase 0. This two-phase approach is necessary since it is possible that the records that are read from the data set do not make it to their destination – for example, they may be rejected by an API, or the target system may be unavailable when the job runs etc.

    In such cases, the records will not be marked as Committed in the data set and will be picked up again next time the job runs.

    Compatibility

    Version Compatibility Notice

    6.5

    DataSetReader introduced as Incubation component in 6.5.0.

    6.6

    DataSetReader with expanded functionality, still an Incubation component in 6.6.0.