Managing data sets
In this section you’ll learn more about how you can wrk with whole data sets rather than working with the data directly. You’ll learn how to create, modify or delete the data sets and more.
Creating data sets
You can create any number of data sets in Data Manager as long as you have Create/Edit/Delete Data Sets in Data Manager permission configured in CloverDX Server. See Data Manager permissions in CloverDX Server to learn more about how to configure permissions in CloverDX Server.
To create a data set, click on the New button in the top-right corner of the Data Sets page in the Data Manager.
When creating a data set, you will have to configure its basic properties, data layout, batching settings and permissions. These are all configured on separate pages in the New Data Set wizard.
Basic data set properties
The first screen of the New Data Set wizard allows you to configure basic settings for the data set like its name, description, and more.
Following settings can be configured on the Basic settings page:
-
Data set name: unique name of the data set. The name can contain special characters (like spaces) and should be descriptive enough to allow you to find your data set when working in Data Manager or in Designer. Internally, the Data Manager will create a data set code that will be used to identify the data set. Data set code does not change even if you rename the data set which allows you to continue using the data set in your CloverDX jobs even if the name has changed.
-
Description: an optional more detailed description of the data set’s purpose.
-
Data retention: specifies the number of days how long are the rows retained in the Data Manager’s storage after they have been committed. Rows that have been committed for longer than the number of days specified in this property will be automatically deleted from the data set. When rows are deleted, their audit records are deleted as well.
Data layout
Data layout specifies the structure of each row in the data set – the column names, types, and other properties.
All rows in the data set have the same layout. The layout can contain any number of user-defined columns and will also contain a set of system-defined columns that are always present and cannot be removed (but can be moved around to a different place within the row).
The layout of the data set is shown in a table which lists all the columns. The first column (at the top of the table) is the first column of the record. Columns can be reordered freely by using the drag handle on the left side of each column’s row in the layout table.
Columns in the data set can have one of the following data types:
Type | Description | Examples | Metadata type |
---|---|---|---|
boolean |
|
|
|
date |
Date and time with millisecond precision. |
|
|
decimal |
Numbers with fractions with up to 22 digits before decimal point and 10 after the decimal point. The numbers are fixed-point - i.e., they are suitable to represent currency values and other exact quantities without various precision issues that often manifest with floating-point numbers. |
|
|
integer |
Whole numbers. |
|
|
string |
Text data of any length with full support for Unicode to allow for characters of any alphabet. |
|
|
When you create a new column, following column settings are shown in a Create a new column side bar:
Following settings can be configured for the new column:
-
Column name: name of the column. The name can contain special characters like spaces or other symbols. Data Manager will generate a Technical name as column identifier automatically based on the human-readable name.
-
Description: optional longer description of the column’s purpose.
-
Data type: one of the data types mentioned in the table above – boolean, date, decimal, integer, string.
-
Visible: defines whether the column is visible to data set editors. Following values are allowed:
-
Visible : the column is visible, and users cannot hide it in the data editor. This is the default setting. Use this setting for most columns that are important for the users to see or to work with.
-
Hidden : the column is not visible in data editor by default, but users can show it via Column chooser menu. Use this setting for columns that may not be as important but may be useful in certain situations.
-
Always hidden : the column is not visible in the data editor and users cannot show it via Column chooser menu. Use this setting for columns you need in your data set, but which should never be touched by users. For example, unique entity id, various technical fields that users should not manipulate etc.
-
-
Editable: configures whether the column value can be changed by data editors (when checked) or whether the value is read only (unchecked).
-
Restrict to lookup: if configured, column values will only be allowed to have one of the values as defined in the lookup that is managed by CloverDX Server. The dropdown will show you a list of all lookups that are registered on the Server. If you need to create a new lookup, please review the Shared lookup tables in CloverDX Server chapter.
-
Technical name: defines a unique identifier for the column in the data set. Technical name will be generated for you automatically based on the column name. In most cases you can leave the technical name in its default value, however, you can change the value if needed. Technical names:
-
Are case-sensitive
-
Can only contain letters, numbers, and underscore
-
Cannot start with a number
-
Batching
Batching allows you to split your data set into partitions called batches. Each batch is a subset of records in the data set. All records with the batch have the same value of the batch key column. Batch key column can be any column in the data set as long as it is an integer or string data type. Rows are assigned their batches in real-time based on the data and there is no need for you to do anything else except for selecting the batch key column.
Using batching with a data set can help you organize the data better to match the expected usage by users or to match the way the data is represented when it arrives.
Few examples:
-
You can batch based on a file name for data that arrives in files. It is often important to process each file as a whole and using batching in this case allows you yo easily work with data in a single file but without having to create new data set for every new file that arrives.
-
You can batch orders based on the origin country. This can help you align your processes better since laws in various countries can place different requirements on how orders must be processed.
-
You can batch invoice data by their date (e.g., month). This gives you quick overview of your invoices and again allows you yo align process withing Data Manager with usual accounting practice of always processing the full month once the month is over.
Any number of batches can be created. The membership of the row in a batch is evaluated in real-time as needed. I.e., it is possible to change the batch for a row simply by changing the value of the row’s batch key column.
Following settings are available on the Batching page:
-
Enable batching: enables of disables batching on the data set. By default, this is turned off (i.e., the data set is not batched). If this is enabled, you will have to select the batch key column.
-
Batch key column: allows you to select which column from the data set is the batch key. Only one column can be selected. The column must be either integer or string.
Note that enabling or disabling batching on the data set does not change the data at all. It simply changes the presentation of the data in the Data Manager. If batching is enabled, the Batches screen will be available for the data set.
User roles
Each data set has its own set of permissions – list of users and their roles with regards to the operations they can perform on the data set.
Roles are configured on a User roles page when creating a new data set:
Clicking on a dropdown for each role will give you a list of all users on the Server and you can select any number of users in each role.
The roles form a simple hierarchy - a higher role has more permissions than a lower role. This means that Admin can do everything that Data Approver can, and Data Approver can do everything that Data Editor can. This means that for each user it is enough to place them into the role with the highest permissions they will need for their work. See more details in Data Set Permissions section.
Editing data sets
Data sets can be edited even after they have been created and data has been loaded to them. To edit a data set, select the *Configure option from the data set’s context menu which is shown on Data Sets screen when you click on the three dots menu next to each data set.
When editing a data set, additional options are available on the Basic settings page:
You can Disable data set to prevent anyone from using it without deleting any data. Disabled data set cannot be used in CloverDX jobs and data in disabled data set cannot be edited in the Data Manager’s editor.
You can also Delete data set to remove all data associated with the data set. Deleting the data set removes all the data as well as audit logs and other related metadata from the Server. Note that deleted data set cannot be recovered.
Editing data set is always a single operation even if you changed multiple settings of the data set. You should make all the changes you need and then Save all the changes at once. The changes are only applied when you save rather than right away when you change something to ensure you do not damage your data in any way.
Changing data set layout
When changing the layout, following operations are allowed:
-
Reordering the columns.
-
Adding new columns.
-
Deleting existing columns.
-
Changing column visibility.
-
Changing column editability.
-
Renaming columns in the data set (both human-readable as well as technical column names can be changed).
-
Changing column descriptions.
-
Changing column lookup settings.
Note that it is not possible to change the column type in a data set that contains data. However, it is possible to change type of a column in an empty data set.
Also note that if you delete a column, the data will be removed and cannot be recovered (the delete is hard delete without ability to undo).
As with other changes, the layout changes are only applied after you click on the Save button.
Changing batching settings
You can enable or disable batching at any time as well as change batch key column to any column within the data set. Batching settings changes do not make any changes to the data – batching only affects how the data is presented in the Data Manager and how it can be read by the DataSetReader component.
Changing permissions
You can change permissions freely – add or remove users in various roles as needed. Note that you cannot remove yourself from the Admin list. This ensures that you cannot create a data set that cannot be changes because all permissions to it have been removed.
Data set permissions
Data set permissions are configured for each data set separately. Each data set has an owner who is also and admin of that data set. The ownership of the data set cannot be changed.
Permissions are configured in terms of user roles. Roles define what users who have these roles can do with the data set. There are three roles – Admin (the most powerful role), Data Approvers, and Data Editor (the least powerful role).
The operations permitted on a data set for each role are shown on the following diagram:
You can assign any number of users into each role. Note that CloverDX Server implements additional permissions that guide access to Data Manager. See User management and access control in the Server Administration documentation.
Data batching
Data batching allows you to split your data into batches based on a value of the batch key column. With batching, data sets gain one additional level: Data set → Batches → Data instead of Data set → Data.
This allows you to effectively split data based on any property while still keeping the data together as part of a single business process.
Batches are created on the fly in real-time and depend solely on the unique values of the batch key column. As such, there can be any number of batches in your data set.
When data set is batched, you will see this directly on the Data Sets screen:
The Batching column provides information about how many batches there are in the data set. The In process column still counts rows and works in the same way as for data sets without batching.
To see the batches, click on the data set and you’ll be brought to the Batches screen instead of the data editor:
Each batch can have its own status that depends on the status of the rows within the batch. Following are the batch statuses:
Batch status | Description |
---|---|
All rows in the batch are in the New status. This batch has not been worked on yet. |
|
Rows in the batch are in various statuses – some may have been Edited, some may be Approved or even Committed. This means that the batch needs additional work before it can be approved. |
|
All rows in the batch are in the Approved status. This status means that the work on the batch has finished, and the batch is waiting for further processing by a CloverDX job. |
|
All rows in the batch are Committed. No further changes can be made to any rows in this batch – all rows have been written to the target system. |