Data Manager introduction
Overview
Data Manager provides functionality that allows domain experts to work directly with data that is processed by CloverDX. Data Manager allows them to view, edit and approve changes to data that flows through the system.
The basic use cases for Data Manager revolve around data quality. It can, for example, be used to store rows that fail data validation and require manual user intervention to correct the data. Data Manager offers an interface where users can easily see the data and associated validation messages and fix any issues by manually editing the data. The rows can then be sent for approval and once approved will be picked up by a CloverDX graph which will send them further downstream.
The above diagram shows basic flow of data in CloverDX platform when Data Manager is used. Data starts its journey in a CloverDX job (e.g., a graph) which reads the data from the source and applies any transformations needed. This process is typically automated, and data is picked up from the source based on a schedule or the load is triggered by an event such as file arrival or an API call.
During this initial processing, data is validated, and any validation issues are collected and sent along with the data to the Data Manager.
Data Manager shows the data to users who can review and modify the data to ensure that any issues are fixed. Once the data is clean, it can be approved in Data Manager for further processing.
All approved rows are picked up from the Data Manager’s storage by another CloverDX job. As before, this step can be automated and performed based on a schedule or variety of triggers.
And finally, the records are uploaded to their destination. This can be any system – whether it is a file, API, cloud app or a database. At the same time, the Data Manager is notified that each row has been processed and will show this information to its users.
Basic concepts
Data set
Data Manager stores its data in data sets. Data set is a collection of rows (records) all of which have the same data layout. Any number of data sets, each with a different layout, can be defined in a single instance of a Data Manager.
Data sets are shown on a Data Sets screen which provides basic data set management functionality.
The screen shows you basic information about each data set you have permissions for. If you are an Admin, you will be able to also see disabled data set or create a new data set.
Following information is shown for each data set:
-
Name: name of the data set. The name can contain spaces and special characters.
-
Batching: information about whether the batching is enabled or not and number of batches for data sets that do have batching configured.
-
Last load: date of when the last row was loaded to the data set and how many rows were loaded at the same time. This can help you see which data sets have been updated recently and may require your attention.
-
In process: shows an overview of rows in the data set. Rows in the data set can be in various statuses and this column shows you the number of records in each status. This can help you quickly see how much work is remaining on the data in the data set. See Row Statuses for more information about row lifecycle in the data set.
Data sets can be either “flat” or can have batching enabled. Batching allows you to essentially partition the data set based on a specific value of a column – for example billing country, arrival date, source file name, etc. To learn more about batching, please see the Data batching section.
Rows are loaded into the data set with CloverDX jobs (graphs, subgraphs, etc.) with the DataSetWriter component. Similarly, CloverDX jobs use DataSetReader component to read data from data set for further processing.
Data set rows
Each data set contains any number of rows with each row having the same data layout (columns). Rows can be in different statuses depending on what work was done with each row.
The structure of each row is described by its data layout. The data layout defines columns (fields) and their data types as well as additional column properties (for example, whether the column is editable, etc.).
The columns can be strings (representing text), numbers (integers as well as decimal numbers), dates, or boolean (representing true/false). For more information about data layout and column data types, please see the Data layout section.
Since each row in the data set has the same layout, the rows can be nicely displayed as a table with pre-defined structure. This is available on the Data editor screen which allows you to see and edit the data in the data set. The data in the data editor is shown in the data grid (or just grid).
To learn more about how to work with data in the data set – how to view or approve changes, edit the data and more – see the Working with data section.