Aspell Lookup Table
All data records stored in this lookup table are kept in memory. For this reason, to store all data records from the lookup table, sufficient memory must be available. If data records are loaded to Aspell lookup table from a data file, the size of available memory should be approximately at least 7 times bigger than that of the data file. However, this multiplier is different for different types of data records stored in the data file.
If you are working with data records that are similar but not fully identical, you should use this type of lookup table. For example, you can use Aspell lookup table for addresses.
Aspell lookup table allows you to have multiple records with the same key value.
Creating Aspell Lookup Table
In the Aspell lookup table wizard, you set up the required properties. You must give a Name to the lookup table, select the corresponding Metadata, select the Lookup key field that should be used to look up data records from the table (must be of string data type).
You can also specify the Data file URL where the data records of the lookup table will be stored and the charset of data file (Data file charset). The default charset is
You can set the threshold that should be used by the lookup table (Spelling threshold). It must be higher than 0.
The higher the threshold, the more tolerant is the component to spelling errors.
Its default value is
It is the
edit_distance value from the query to the results.
Words with this value higher that the specified limit are not included in the results.
You can also change the default costs of individual operations (Edit costs):
Used when the case of one character is changed.
Used when one character is transposed with another in the string.
Used when one character is deleted from the string.
Used when one character is inserted to the string.
Used when one character is replaced by another one.
You need to decide whether the letters with diacritic marks are considered identical with those without these marks.
To do that, you need to set the value of the Remove diacritic marks attribute.
If you want diacritic marks to be removed before computing the
edit_distance value, you need to set this value to
This way, letters with diacritic marks are considered equal to their Latin equivalents.
(Default value is
By default, letters with diacritic marks are considered different from those without.)
If you want best guesses to be included in the results, set Include best guesses to
The default value is
Best guesses are the words whose
edit_distance value is higher than the Spelling threshold, for which there is no other better counterpart.
Then click OK and Finish.
If you want to know the distance between the lookup table and edge values, you must add another field of numeric type to lookup table metadata.
Set this field to Autofilling (
Select this field in the Edit distance field combo.
When you are using Aspell lookup table in LookupJoin, you can map this lookup table field to corresponding field on the output port 0.
This way, values that will be stored in the specified Edit distance field of lookup table will be sent to the output to another specified field.