Model
Common properties of AI components
|
For customers looking to explore the capabilities of AI components, the setup can be accelerated by using the official Docker image for AI workloads. This image includes a pre-configured CloverDX Server as well as drivers and additional software needed to take advantage of hardware acceleration of AI workloads. The image must be deployed on a machine with NVIDIA GPU to use the GPU acceleration. |
Model
AI components which use locally hosted models require you to specify the model you want to use. The choice of model determines the capabilities of the components.
Models are not distributed as part of the core product and need to be downloaded and deployed to CloverDX Server by the admin before use. You can provide your own model (some configuration required - for experienced users) or choose from a curated set of free, ready-to-use models available in our online CloverDX Marketplace (recommended).
Models from the CloverDX Marketplace are provided as libraries you can install to your CloverDX Server. Once installed, the model will be available to the components via Server model property, no further configuration is required.
-
Server model: (Recommended) In a Server project, select pre-configured models available from libraries installed on the Server. Go to CloverDX Marketplace to download models you need and install them on the Server.
-
Classification model directory: (for experienced users) The model can be also specified as URI of its directory. For details, see Machine Learning Models.
Models downloaded from the CloverDX Marketplace and selected via Server model property will automatically configure the following model properties.
Model name is a read-only property which shows the name from model configuration files.
Device determines whether the model is run on processor (CPU) or graphics card (GPU). Processing on GPU is much faster but you need a specialized hardware to use it.
Model arguments, Tokenizer arguments and Translator arguments allow to modify model behavior. They are mode-dependent.
Input/output parameters
Fields to classify specify fields to be analyzed.
Token/text classes and thresholds allow to define classes whose scores shall be computed. The threshold specifies the minimum score at which the class will be included in the output.
Classification output field sets an output field which will store the analysis results. It must be of variant type. If the field already contains some analysis, the analyses are merged, so that you can concatenate several AI components and use their combined output.
Batch size sets number of records processed by model together.
Error handling
Token overflow policy determines what happens when some input field value cannot be encoded because it exceeds the model-specific maximum length. The strict policy causes the component to fail while lenient just logs a warning and truncates the input.
Advanced
Transform allows to control what units are used to generate output records. A separate record can be created for each input record, each sequence-class pair, or both.
Cache
To change the default DJL cache directories, use DJL_CACHE_DIR and ENGINE_CACHE_DIR as system properties or environment variables.
AI component execution and model caching
This feature is designed to prevent native memory leaks that may occur when AI models are repeatedly loaded and unloaded. By caching models for the lifetime of the Java process and controlling prediction execution via a dedicated thread pool, the system ensures stable and predictable memory usage during inference.
Model caching
By default, once a model is loaded and used by an AI component, it is cached in memory and remains there for the lifetime of the Java process (i.e., it is never unloaded). Model caching is controlled by the cloverdx.ai.caching configuration property.
Prediction thread pool
Inference tasks are executed in a fixed-size thread pool. This pool isolates prediction work to dedicated threads.
The default pool size is 4, which limits the number of AI components that can run concurrently. If more AI components attempt to run in parallel, the components that do not fit into the pool will wait for their slot in the pool (i.e., for one of the components that occupy the pool to finish processing).
The pool itself is controlled by the cloverdx.ai.pool property.
You can modify the pool size by using the cloverdx.ai.pool.size configuration property.
| This component is currently in the incubation phase. Although it is available for use, it is under active development and may be subject to changes. We welcome feedback and encourage users to explore its capabilities. |
Short description
The AIAnonymizer lets you run a token classification model (preferably a PII detection model such as Piiranha - see CloverDX Marketplace) and mask identified tokens in the output.
This component behaves just like AITokenClassifier but with the added functionality of masking (anonymizing) tokens identified by the model above configured thresholds.
| Same input metadata | Sorted inputs | Inputs | Outputs | Each to all outputs | Java | CTL | Auto-propagated metadata |
|---|---|---|---|---|---|---|---|
- |
⨯ |
1 |
1 |
⨯ |
⨯ |
⨯ |
✓ |
Ports
| Port type | Number | Required | Description | Metadata |
|---|---|---|---|---|
Input |
1 |
✓ |
The text(s) to classify |
At least one |
Output |
1 |
✓ |
Copy of the input data with anonymized texts + token classification result |
Any |
Metadata
AIAnonymizer propagates input metadata to output.
AIAnonymizer attributes
| Attribute | Req | Description | Possible values |
|---|---|---|---|
Server model |
Recommended: Use a model installed as a library on the CloverDX Server. Check CloverDX Marketplace for available ready-to-use models. This is a more convenient alternative to Classification model directory. |
||
Classification model directory |
Path to the machine learning model directory. It is required unless Server model is defined. |
||
Model name |
no |
A read-only field displaying name defined in model configuration files (if available). |
|
Device |
yes |
The device to run the model – either processor (CPU) or graphics card (GPU). You must set the device the model is designed for. GPU models are much faster but you need a specialized hardware to use them. |
CPU (default) | GPU |
Model arguments |
no |
Configuration arguments for the model. See documentation of your particular model. |
|
Tokenizer arguments |
no |
Configuration arguments for the tokenizer. See documentation of your particular model. |
|
Translator arguments |
no |
Configuration arguments for the translator. See documentation of your particular model. |
|
Input / output parameters |
|||
Fields to anonymize |
yes |
List of |
|
Anonymize classes and thresholds |
List of token classes who shall be anonymized. The classes are model-dependent; you can use only some of them, but you cannot add classes unknown to the model. The thresholds define the minimum score at which the particular token is anonymized – it is masked if at least one class reaches its threshold. |
||
Masking type |
yes |
Determines which anonymization method is applied: character-level masking using the mask character, or full-string redaction using the redact string. |
|
Mask character |
no |
The character used for masking characters of the anonymized tokens. |
|
Redact string |
no |
The static replacement value that substitutes the entire anonymized string. |
|
Anonymization information |
no |
An output field which will store the analysis results. It must be of variant type. If the field already contains some analysis, the analyses are merged, so that you can concatenate several AI components and use their combined output. |
|
Batch size |
no |
Number of records processed by model together. |
an integer number |
Compatibility
| Version | Compatibility notice |
|---|---|
7.1.0 |
AIAnonymizer is available since CloverDX version 7.1. |