AIZeroShotClassifier

Development > Component reference > AI Components > AIZeroShotClassifier

Short description

Ports

Metadata

AIZeroShotClassifier attributes

Compatibility

Short description

The AIZeroShotClassifier component processes probabilistic multi-label classification for user-defined classes; that is, it lets you define your own classes (think of terms, entities, topics, …) and it will score the input text against them.

For example, you can define classes like "weather", "mood", and "cat" and then for an input text such as "The weather is really nice today", you will get a really high score for "weather", reasonable score for "mood" and negligible score for "cat".

MODEL WARNING This component currently only works with Facebook’s BART Zero-Shot Text Classification model - please visit CloverDX Marketplace to download a ready-to-use package with the model.

Performance warning Zero-shot classification is generally a very expensive operation. Consider using models trained for pre-defined classes, which are more efficient. In that case see AITextClassifier.

Same input metadata	Sorted inputs	Inputs	Outputs	Each to all outputs	Java	CTL	Auto-propagated metadata
-	⨯	1	1	⨯	⨯	✓	✓

Same input metadata

Sorted inputs

Inputs

Outputs

Each to all outputs

Java

CTL

Auto-propagated metadata

⨯

✓

Ports

Port type Number Required Description Metadata

Port type	Number	Required	Description	Metadata
Input	1	✓	The text(s) to classify	At least one `string` field
Output	1	⨯	Copy of the input data + text classification result	Any

Input

✓

The text(s) to classify

At least one string field

Output

⨯

Copy of the input data + text classification result

Any

Metadata

AIZeroShotClassifier propagates input metadata to output.

AIZeroShotClassifier attributes

Attribute Req Description Possible values

Attribute	Req	Description	Possible values
Model
Server model		This component currently only works with Facebook’s BART Zero-Shot Text Classification model - please visit CloverDX Marketplace to download a ready-to-use package with the model.
Classification model directory		Path to the machine learning model directory. It is required unless Server model is defined.
Model name	no	A read-only field displaying name defined in model configuration files (if available).
Device	yes	The device to run the model – either processor (CPU) or graphics card (GPU). You must set the device the model is designed for. GPU models are much faster but you need a specialized hardware to use them.	CPU (default) \| GPU
Model arguments	no	Configuration arguments for the model. See documentation of your particular model.
Tokenizer arguments	no	Configuration arguments for the tokenizer. See documentation of your particular model.
Translator arguments	no	Configuration arguments for the translator. See documentation of your particular model.
Input / output parameters
Fields to classify	yes	List of `string` fields to be classified.
Classes and thresholds	yes	List of user-defined text classes whose score shall be computed. There is no restriction for the class names – they can consist of one word (“medicine“) as well as a phrase or sentence (“medicine, a science or practice of caring for patients, managing the diagnosis, prognosis, prevention and treatment”). Optional thresholds define the minimum score at which the particular class is added to output.
Classification output field	no	An output field which will store the analysis results. It must be of variant type. If the field already contains some analysis, the analyses are merged, so that you can concatenate several AI components and use their combined output.
Batch size	no	Number of records processed by model together.	an integer number
Error handling
Token overflow policy	no	Specifies behavior when some input text cannot be encoded because it exceeds the model-specific maximum length. The strict policy causes the component to fail while lenient just logs a warning and truncates the input.	strict (default) \| lenient
Advanced
Transform	no	Set of CTL methods to control what units are used to generate output records. A separate record can be created for each input record, each text–class pair, or both. For example, you can find the class with the greatest score and only generate output for this class.

Model

Server model

This component currently only works with Facebook’s BART Zero-Shot Text Classification model - please visit CloverDX Marketplace to download a ready-to-use package with the model.

Classification model directory

Path to the machine learning model directory. It is required unless Server model is defined.

Model name

A read-only field displaying name defined in model configuration files (if available).

Device

yes

The device to run the model – either processor (CPU) or graphics card (GPU). You must set the device the model is designed for. GPU models are much faster but you need a specialized hardware to use them.

CPU (default) | GPU

Model arguments

Configuration arguments for the model. See documentation of your particular model.

Tokenizer arguments

Configuration arguments for the tokenizer. See documentation of your particular model.

Translator arguments

Configuration arguments for the translator. See documentation of your particular model.

Input / output parameters

Fields to classify

yes

List of string fields to be classified.

Classes and thresholds

yes

List of user-defined text classes whose score shall be computed. There is no restriction for the class names – they can consist of one word (“medicine“) as well as a phrase or sentence (“medicine, a science or practice of caring for patients, managing the diagnosis, prognosis, prevention and treatment”). Optional thresholds define the minimum score at which the particular class is added to output.

Classification output field

An output field which will store the analysis results. It must be of variant type. If the field already contains some analysis, the analyses are merged, so that you can concatenate several AI components and use their combined output.

Batch size

Number of records processed by model together.

an integer number

Error handling

Token overflow policy

Specifies behavior when some input text cannot be encoded because it exceeds the model-specific maximum length. The strict policy causes the component to fail while lenient just logs a warning and truncates the input.

strict (default) | lenient

Advanced

Transform

Set of CTL methods to control what units are used to generate output records. A separate record can be created for each input record, each text–class pair, or both.

For example, you can find the class with the greatest score and only generate output for this class.

Compatibility

Version	Compatibility notice
7.1.0	AIZeroShotClassifier is available since CloverDX version 7.1.

Version

Compatibility notice

7.1.0

AIZeroShotClassifier is available since CloverDX version 7.1.