DataSetReader

Development > Component reference > Incubation > DataSetReader

Short description

Unloading data from Data Manager

Compatibility

Short description

DataSetReader reads data from data sets stored in the Data Manager.

This component must run on CloverDX Server. It only connects to the Data Manager instance deployed in the same CloverDX Server instance where it is running.

Data source	Input ports	Output ports	Each to all outputs	Different to different outputs	Transformation	Transf. req.	Java	CTL	Auto-propagated metadata
Any data set in Data Manager	0	1	⨯	⨯	✓	⨯	⨯	✓	✓

Data source

Input ports

Output ports

Each to all outputs

Different to different outputs

Transformation

Transf. req.

Java

CTL

Auto-propagated metadata

Any data set in Data Manager

⨯

✓

⨯

✓

Ports

Port type	Number	Required	Description	Metadata
Output	0	✓	For the data read from the selected data set.	Auto-propagated based on the layout of the selected data set. Custom metadata and Output mapping can also be used.

Port type

Number

Required

Description

Metadata

Output

✓

For the data read from the selected data set.

Auto-propagated based on the layout of the selected data set.
Custom metadata and Output mapping can also be used.

Metadata

The DataSetReader component propagates metadata on the output port. Metadata is created based on the layout of the selected data set.

If different metadata is used, the mapping from data set’s metadata to the metadata on the output port can be done via the Output mapping component attribute.

DataSetReader attributes

Attribute Req Description Possible values

Attribute	Req	Description	Possible values
Basic
Data Set	✓	Data set to read from. Clicking on the select button will show all data sets available on the Server. Data set is identified by its code. The code is assigned to the data set when it is created and does not change when the data set is renamed.
Record status		Allows you to filter the records based on their status. Default value is Approved.	All (any status) New Edited Approved (default) Committed
Include deleted		Configure whether to include records marked as deleted when reading from the data set. The default value is “false” – deleted records are not included in the data returned by the component.	`false` (default) `true`
Complete batches only		Configure whether to include only batches that have all records in Approved or Committed status. If this is disabled, records are read from batches regardless of the overall batch status. If this is enabled, records from batch are read only if all records in given batch are approved. This setting cannot be used on data sets that do not have batching enabled – the component will fail in such case. Default is false (disabled).	`false` (default) `true`
Output mapping		Allows you to map data read from the data set to the output port. By default, this is set to Map by name and fields with matching names and types will be mapped automatically. This is consistent with the common usage where the metadata on output port 0 is auto-propagated and will match the data set exactly.
Advanced
Max number of records		Configure how many records to read from the data set. If the data set contains fewer matching records than specified, the component will finish once it reads all of them – it will not wait for the data set to grow. If this attributed is left empty, all records from the data set will be read.

Basic

Data Set

✓

Data set to read from. Clicking on the select button will show all data sets available on the Server.
Data set is identified by its code. The code is assigned to the data set when it is created and does not change when the data set is renamed.

Record status

Allows you to filter the records based on their status. Default value is Approved.

All (any status)
New
Edited
Approved (default)
Committed

Include deleted

Configure whether to include records marked as deleted when reading from the data set.
The default value is “false” – deleted records are not included in the data returned by the component.

false (default)
true

Complete batches only

Configure whether to include only batches that have all records in Approved or Committed status.
If this is disabled, records are read from batches regardless of the overall batch status.
If this is enabled, records from batch are read only if all records in given batch are approved.
This setting cannot be used on data sets that do not have batching enabled – the component will fail in such case. Default is false (disabled).

false (default)
true

Output mapping

Allows you to map data read from the data set to the output port. By default, this is set to Map by name and fields with matching names and types will be mapped automatically. This is consistent with the common usage where the metadata on output port 0 is auto-propagated and will match the data set exactly.

Advanced

Max number of records

Configure how many records to read from the data set. If the data set contains fewer matching records than specified, the component will finish once it reads all of them – it will not wait for the data set to grow. If this attributed is left empty, all records from the data set will be read.

Details

DataSetReader connects to an instance of Data Manager running on the same Server as the component and returns data from the selected data set. The component is intended for usage in “post-processing” jobs which read data from the Data Manager and load the data to the target system.

Unloading data from Data Manager

The basic pattern for reading data from Data Manager is to use DataSetReader component followed by any components implementing the logic for the data and then followed by the DataSetCommit to mark the records as fully processed.

As an example, a simple job pulling data from the Data Manager may look like this:

Figure 454. A simple job that reads data from Data Manager, loads the records to the data warehouse and then informs the Data Manager that those records have been fully processed.

DataSetReader must be used together with DataSetCommit component to mark the records processed with the reader as Committed. If this is not done, the records will stay in Approved status and will never be purged from the Data Manager.

The job above first reads data from the specified data set with the DataSetReader, then loads the records to the warehouse and finally notifies the Data Manager that the records were successfully processed by setting their status to Committed with the DataSetCommit.

Note how the DataSetCommit is in phase 5 while the DataSetReader and WriteListingToDWH are both in phase 0. This two-phase approach is necessary since it is possible that the records that are read from the data set do not make it to their destination – for example, they may be rejected by an API, or the target system may be unavailable when the job runs etc.

In such cases, the records will not be marked as Committed in the data set and will be picked up again next time the job runs.

Compatibility

Version	Compatibility Notice
6.5	DataSetReader introduced as Incubation component in 6.5.0.
6.6	DataSetReader with expanded functionality, still an Incubation component in 6.6.0.

Version

Compatibility Notice

6.5

DataSetReader introduced as Incubation component in 6.5.0.

6.6

DataSetReader with expanded functionality, still an Incubation component in 6.6.0.

{{{ highlightedName }}}

DataSetReader

Short description

Ports

Metadata

DataSetReader attributes

Details

Unloading data from Data Manager

Compatibility

See also