Database connections
If you’re interested in learning more about this subject, we offer the Databases course in our CloverDX Academy. |
A database connection lets you access database data sources. With a database connection, you can read data from database tables, perform SQL queries or insert records into database tables. These actions are taken by the components using a database connection.
There are two ways of accessing a database:
-
Using a client on your computer that connects to a database located on some server by means of some client utility. This approach is used in bulkloaders.
-
Using a JDBC driver.
Each database connection requires a JDBC driver. JDBC drivers for commonly used databases are included in CloverDX Designer.
When using database connections in a CloverDX Server project, all database connectivity is performed server-side. One of the benefits is that database servers accessible from CloverDX Server can be also used from within CloverDX Designer. |
Database connections can be internal or external (shared). Internal database connection can be converted to external and vice versa.
Database connection properties dialog is described in Database connection properties.
Access password can be encrypted. See Encryption of access password.
Database connection can serve as a resource for creating metadata. See Browsing database and extracting metadata from database tables.
Remember that you can also create a database table directly from metadata. See Create database table from metadata.
Internal database connections
Internal database connections are part of a graph, they are contained in it and can be seen in its source tab.
Creating internal database connections
To create an internal database connection, right-click Connections in the Outline pane and select .
You can also open the dialog by selecting some DB connection item in the Outline pane and pressing Enter.
For detailed information about how a database connection should be created and configured, see Database connection properties.
When all attributes of the connection have been set, you can validate your connection by clicking the Validate connection button.
To create the internal database connection, click Finish. The new internal database connection is a part of the graph.
Externalizing internal database connections
Each internal database connection can be externalized, i.e. converted to the external one. Thus, you would be able to use the same database connection for more graphs, i.e. more graphs would share the connection.
You can externalize an internal connection by right-clicking the internal connection item in the Outline pane and selecting Externalize connection from the context menu.
A new window will open.
The dialog offers the conn
folder of your project as the location for this new external (shared) connection configuration file.
A configuration file name is created from the database connection name; however, it can be changed. Click OK to finish externalization.
The internal connection item disappears from the Outline pane Connections group; but at the same location, there appears already linked the newly created external (shared) connection configuration file.
The same configuration file appears in the conn
subfolder of the project.
The file can be seen in the Project Explorer pane.
Externalizing multiple connections
You can externalize multiple internal connection items at once.
To do this, select them in the Outline pane.
Right-click and select Externalize connection from the context menu.
A new window will open in which the conn
folder of your project will be offered as the location for the first of the selected internal connection items.
Finally, click OK.
The same window will open for each of the selected connection items until they are all externalized. The name of any configuration file can be changed. If you want or if the file with the same name already exists, you can change the offered name of any connection configuration file.
You can choose adjacent connection items when you press Shift and move the Down Cursor or the Up Cursor key. If you want to choose non-adjacent items, use Ctrl+Click at each of the desired connection items instead.
The same is valid for both the database and JMS connections.
After that, the internal file disappears from the Outline pane connections folder; but at the same location, a newly created configuration file appears.
The same configuration file appears in the conn
subfolder in the Project Explorer pane.
Exporting internal database connections
Exporting is similar to externalizing an internal database connection. You create a connection configuration file that is outside the graph in the same way as an externalized connection, but such a file is no longer linked to the original graph. Subsequently, you can use such a file in other graphs as an external (shared) connection configuration file as mentioned in the previous sections.
You can export an internal database connection into an external (shared) one by right-clicking one of the internal database connection items in the Outline pane and selecting Export connection from the context menu.
The conn
folder of the corresponding project will be offered for the newly created external file.
The name of the file can be changed.
Click Finish to create the file.
After that, the Outline pane connection folder remains the same, but in the conn
folder in the Project Explorer pane the newly created connection configuration file appears.
You can even export more selected internal database connections in a similar way as it is described in the previous section about externalizing.
External (shared) database connections
External (shared) database connections are connections stored outside graphs, so they can be used in multiple graphs.
Only the connection configuration is shared. If two graphs use the same external (shared) connection, they use the same connection configuration, but each of them has its independent database connection.
Best practice is to place an external (shared) connections into the conn
directory.
Creating external (shared) database connections
To create external database connections, see Externalizing internal database connections
Linking external (shared) database connections
External (shared) database connections can be linked to each graph in which they should be used.
Right-click either the Connections group or any of its items.
Select from the context menu.
A File URL dialog displaying the project content will open.
Expand the conn
folder and select the desired connection configuration file from all the files contained.
Linking multiple connections
You can even link multiple external (shared) connection configuration files at once.
Right-click either the Connections group or any of its items and select from the context menu.
Then, a File URL dialog displaying the project content will open.
Expand the conn
folder in this dialog and select the desired connection configuration files from all the files contained.
You can select adjacent file items when you press Shift and move the Down Cursor or the Up Cursor key. If you want to select non-adjacent items, use Ctrl+Click at each of the desired file items instead.
The same is valid for both the database and JMS connections.
Internalizing external (shared) database connections
Any linked external (shared) connection can be converted into an internal connection. The connection becomes a part of the graph, but the external configuration file does not disappear. In such a case, you would see the connection structure in the graph itself.
You can convert any external (shared) connection configuration file into an internal connection by right-clicking the linked external (shared) connection item in the Outline pane and using Internalize connection from the context menu.
Internalizing multiple connections
You can even internalize multiple linked external (shared) connection configuration files at once.
To do this, select the desired linked external (shared) connection items in the Outline pane. You can select adjacent items when you press Shift and move the Down Cursor or the Up Cursor key. If you want to select non-adjacent items, use Ctrl+Click at each of the desired items instead.
After that, the selected linked external (shared) connection items disappear from the Outline pane Connections group; but at the same location, newly created internal connection items appear.
However, the original external (shared) connection configuration files will remain in the conn
subfolder as can be seen in the Project Explorer pane.
The same is valid for both the database and JMS connections.
Database connection properties
The Database connection properties dialog lets you configure particular properties of a database connection. In the dialog, you can choose a JDBC driver or an existing JNDI resource, set up credentials, transaction isolation level, etc.
You can create a database connection that uses a user-defined JDBC Driver or a JNDI Resource.
JDBC driver
The dialog consists of two tabs: Basic properties and Advanced properties.
Basic properties of JDBC driver
In the Basic properties tab of JDBC driver in the Database connection dialog, you specify the name of the connection, type your User name, your access Password and URL of the database connection (hostname, database name or other properties).
The password can be encrypted using Secure graph parameters.
JDBC specific
The default JDBC specific can be used, but the specific one might suit your purpose better. By setting JDBC specific, you can slightly change the behavior of the connection such as different data type conversion, getting auto-generated keys, etc.
Database connection is optimized according to this attribute. JDBC specific adjusts the connection for the best cooperation with the given type of database.
You can also choose from the following connections built in CloverDX: Microsoft SQL Server (for Microsoft SQL Server 2008 or newer or Microsoft SQL Server 2000-2005 specific), MySQL, Oracle, PostgreSQL, Sybase and SQLite.
After selecting one of them, you can see in the connection code one of the following expressions: database="MSSQLSERVER"
, database="MYSQL"
, database="ORACLE"
, database="POSTGRE"
, database="SYBASE"
or database="SQLITE"
, respectively.
Adding JDBC drivers and jars
If you want to use some other driver, you can use one of the Available drivers. If the desired JDBC driver is not in the list, you can add it by clicking the sign located on the right side of the dialog (Load driver from JAR). Then you can locate the driver and confirm its selection.
If necessary, you can also add another JAR to the driver classpath (Add JAR to driver classpath). For example, some databases may need their license to be added as well as the driver.
JDBC drivers
CloverDX has built-in JDBC drivers for: Microsoft SQL Server 2008 or newer, MySQL, Oracle, PostgreSQL, SQLite and Sybase databases. You can choose any JDBC driver from the list of available drivers.
Since version 5.10.0, Microsoft SQL Server uses Microsoft JDBC Driver library. The former driver (using jTDS library) has been renamed to Microsoft SQL Server Legacy. |
By clicking any driver, a connection string hint appears in the URL text area. You only need to modify the connection.
You can also specify JNDI.
Remember that CloverDX supports JDBC 3 drivers and higher. |
Once you have selected the driver from the list, you only need to type your username and password for connecting to the database.
You also need to change the hostname to its correct name.
Type the right database name instead of the database
filler word.
Some other drivers provide different URLs that must be changed in a different way.
You can also load an existing connection from one of the existing configuration files.
You can set up the JDBC specific property, or use the default one; however, it may not do all that you want. By setting JDBC specific you can slightly change the selected connection behavior such as different data type conversion, getting auto-generated keys, etc.
Database connections are optimized based on this attribute. JDBC specific adjusts the connection for the best cooperation with the given type of database.
Note that you can also remove a driver from the list in the Basic tab (Remove selected) and custom properties in the Advanced tab (Remove parameter(s)) by clicking theMinus sign on the respective tab.
Microsoft Access
This driver internally uses the UCanAccess driver. CloverDX uses version 5.0.1. See the offical UcanAccess webpage.
-
In DatabaseWriter,
long
type cannot be used in input metadata. Consider using Map in your graph to convertlong
fields to other metadata types. -
boolean
fields that arenull
will be actually written asfalse
(null value is not supported). -
You cannot write
null
intobinary
fields, either. -
By default, CloverDX uses the
singleconnection=true
property of the UCanAccess driver to minimize usage of system resources. This default can be overridden in connection URL or in Custom JDBC properties. For more details on driver properties, see the driver’s documentation.
MS Access data type NUMBER with field size INTEGER corresponds to SQL_SMALLINT, which can only hold values between approximately -32,000 and +32,000. If you store any value that would overflow this data type field, UCanAccess driver does not report the overflow and saves overflown (incorrect) value. Check the value before insertion. |
Introduced in version 4.0.7. |
Advanced properties of JDBC driver
In addition to the Basic properties tab described above, the Database connection dialog also offers the Advanced properties tab.
If you switch to this tab, you can specify some other properties of the selected connection:
Thread-safe connection
By default, it is enabled. In this default setting, each thread gets its own connection to prevent problems when more components communicate with the same database simultaneously (through the same connection object which is not thread safe).
Transaction isolation
Allows to specify certain transaction isolation level. More details can be found here: Interface Connection. Possible values of this attribute are:
-
0
(TRANSACTION_NONE
).A constant indicating that transactions are not supported.
-
1
(TRANSACTION_READ_UNCOMMITTED
).A constant indicating that dirty reads, non-repeatable reads and phantom reads can occur. This level allows a row changed by one transaction to be read by another transaction before any changes in that row have been committed (a "dirty read"). If any of the changes are rolled back, the second transaction will have retrieved an invalid row.
This is the default value for DB2, EXASolution, Informix, MySQL, MS SQL Server 2008 or newer, MS SQL Server 2000-2005, PostgreSQL and SQLite specifics.
This value is also used as default when JDBC specific called Generic is used.
-
2
(TRANSACTION_READ_COMMITTED
).A constant indicating that dirty reads are prevented; non-repeatable reads and phantom reads can occur. This level only prohibits a transaction from reading a row with uncommitted changes in it.
This is the default value for Oracle, Sybase and Vertica specifics.
-
4
(TRANSACTION_REPEATABLE_READ
).A constant indicating that dirty reads and non-repeatable reads are prevented; phantom reads can occur. This level prohibits a transaction from reading a row with uncommitted changes in it, and it also prohibits a situation where one transaction reads a row, a second transaction alters the row and the first transaction rereads the row, getting different values the second time (a "non-repeatable read").
-
8
(TRANSACTION_SERIALIZABLE
).A constant indicating that dirty reads, non-repeatable reads and phantom reads are prevented. This level includes the prohibitions in
TRANSACTION_REPEATABLE_READ
and further prohibits a situation where one transaction reads all rows that satisfy awhere
condition, a second transaction inserts a row that satisfies thewhere
condition and the first transaction rereads for the same condition, retrieving the additional "phantom" row in the second read.
Holdability
Allows to specify holdability of ResultSet
objects created using the Connection
.
More details can be found here: Interface ResultSet.
Possible options are the following:
-
1
(HOLD_CURSORS_OVER_COMMIT
).The constant indicating that
ResultSet
objects should not be closed when the methodConnection.commit
is called.This is the default value for Informix and MS SQL Server 2008 or newer specifics.
-
2
(CLOSE_CURSORS_AT_COMMIT
).The constant indicating that
ResultSet
objects should be closed when the methodConnection.commit
is called.This is the default value for DB2, MS SQL Server 2000-2005, MySQL, Oracle, PostgreSQL, SQLite, Sybase and Vertica specifics.
This value is also used as default when JDBC specific called Generic is used.
JNDI resource
In the JNDI resource tab, you can create a connection from an existing JNDI resource.
JNDI is a configuration used to obtain a connection from a JNDI-bound data source. A data source generates database connections from a connection pool which is defined on the level of an application server.
Basic properties
In Basic tab of JNDI resource, you can name the database connection and choose a corresponding database JNDI resource from the tree-view.
Connection name is a user-defined name of the database connection.
JDBC specific is described in JDBC driver.
JNDI name is a name of a JNDI data source. If you choose a particular JNDI data source from the tree-view below, JNDI name is filled in.
Root context allows you to choose root context of JNDI resource. You can choose one of the pre-filled values: java:comp, java:comp/env and <empty>, or you can add your own value: click the combo, type the value, and press Enter.
To create a database connection, specify the Connection name and choose the database JNDI resource from the tree-view. If there are too many resources, Filter might help you.
If you have configured JNDI resource but you cannot see it in the list, there can be a problem with configuration (e.g. typo or bad password) or the database might refuse the resource to connect due to limit on connections. You can enter the JNDI Name and use the connection to see the cause. |
Advanced properties
The advanced tab of JNDI resource has the same configurable items as the advanced tab of JDBC driver. See Advanced Properties of JDBC driver.
Encryption of access password
CloverDX supports encryption of passwords.
Why encryption?
If you do not encrypt your access password, it remains stored and visible in the configuration file (shared connection) or in the graph itself (internal connection).
Unless you are the only person with an access to your graph and computer, we recommend to encrypt it, since the password allows access to the database in question.
In case you give someone any of your graphs, you do not have to give them the access password to the whole database. This is why it is possible to encrypt your access password. Without this option, you would be at great risk of an intrusion into your database or of some other damage caused by unauthorized access.
Encrypting the database passwords
To encrypt a database password use Secure parameters. (See Secure Graph Parameters). Store the password in the parameter and use the parameter in the connection dialog instead of the password.
If you would like to return to your access password, you can do it by typing the password into the Database connection dialog and clicking OK.
Browsing database and extracting metadata from database tables
As you can see in Externalizing internal database connections and Internalizing external (shared) database connections), in both cases, the context menu contains two interesting items: Browse database and New metadata. These give you an opportunity to browse a database (if your connection is valid) and/or extract metadata from some selected database table. Such metadata will be internal only, but you can later externalize and/or export them.
Remember that you can also create a database table directly from metadata. See Create database table from metadata. |
Windows authentication on Microsoft SQL Server
Windows authentication means creating a database connection to Microsoft SQL Server using your Windows account, instead of User and Password fields (see figure below). The Microsoft SQL Server JDBC driver depends on native libraries you have to install. To enable this all, follow the steps described in this section.
CloverDX comes with a bundled Microsoft SQL Server JDBC driver.
However, it does not contain native libraries required for Windows authentication on Microsoft SQL Server.
Thus, it is necessary to download the native dll
(mssql-jdbc_auth-12.4.2.x64.dll
) and perform some additional settings.
Getting the native library
CloverDX bundles Microsoft SQL Server JDBC driver v. 12.4.2.. To download the driver follow these instructions:
-
Get the dist package.
-
Extract the contents and go to folder
sqljdbc_12.4\enu\auth\x64\
. -
Copy the
mssql-jdbc_auth-12.4.2.x64.dll
file to a folder, e.g.C:\mssql_dll
.
Installation
Now there are two ways how to make the dll
work.
The first one involves changing Windows PATH
variables.
If you do not want to do that, go for the second option.
-
Add the absolute path to the
dll
file (C:\mssql_dll
) to the WindowsPATH
variable. Alternatively, you can put thedll
file to some folder which is already included inPATH
, e.gC:\WINDOWS\system32
. -
Modify the
java.library.path
property for all members of the CloverDX Family of products:-
Designer
Modify VM Parameters in the graph’s Runtime configuration screen (see figure below). Add this line to VM parameters:
-Djava.library.path=C:\mssql_dll
The runtime configuration is valid for all graph within the same workspace.
Figure 253. Adding path to the nativedll
to VM parameters. -
CloverDX Server
In the script that starts Tomcat, add the
-Djava.library.path=C:\mssql_dll
option toJAVA_OPTS
. For example, add the following line at the beginning ofcatalina.bat
:set JAVA_OPTS=%JAVA_OPTS% -Djava.library.path=C:\mssql_dll
-
-
MS SQL Server - make sure you have:
-
TCP/IP
Enabled
in -
TCP Port set to 1433 in TCP/IP
-
-
Enable Windows authentication in the URL
Add the integratedSecurity=true parameter to the JDBC URL, e.g.
jdbc:sqlserver://hostname:1433;databaseName=databaseName;integratedSecurity=true;
Snowflake connection
You can connect to Snowflake in the same way as to any other SQL database using a DB Connection with the bundled Snowflake JDBC driver.
This section contains information about working with date-time data types that is common to DatabaseReader, DatabaseWriter and SnowflakeBulkWriter.
Date/Time data types in Snowflake
Working with dates and time zones can be challenging. As a quick rule before going into details, always specify the Format of date data fields in your metadata, see Date and Time Format. Also setting the Session Time Zone on the connection to match your local time zone may help.
There are five different Snowflake data types and two time zones (session TZ and local TZ) that affect the results.
Snowflake type | Description | Time zone |
---|---|---|
DATE |
A date without a time component. |
local |
TIME |
Wallclock time without a date. |
|
TIMESTAMP_NTZ |
Date and time without a time zone. Saved as the wallclock date-time in the session time zone. The time zone itself is not stored. |
session |
TIMESTAMP_TZ |
Date and time with a time zone. The session time zone is stored as a part of the value and used for reading. |
|
TIMESTAMP_LTZ |
Written in the local time zone, but the time zone is not stored. The current time zone is always used for reading. |
local |
All of these Snowflake types map to just one data type in CloverDX metadata: date.
An important property of the date data type is its Format. It provides a hint on how to interpret the data while reading or writing them to the database.
If the Format contains only a date part, it maps to the DATE type. If it contains only a time part, we consider it a TIME. And if it contains both, we treat it as a TIMESTAMP.
For example, "yyyy-MM-dd" or "dd/MM/yyyy" are DATE formats, "HH:mm:ss" is a TIME format, and "dd.MM.yyyy HH:mm:ss" is a TIMESTAMP.
See Date and Time Format for more details.
Note that regardless of the Format, CloverDX records store the full date and time value. The Format controls how that value is interpreted when reading and writing data.
Session time zone
This is the time zone of the connection, as configured in Snowflake. It is used for writing TIMESTAMPs.
See the TIMEZONE parameter in Snowflake documentation.
You can override the session time zone on the connection in Advanced Properties of JDBC driver, e.g., TIMEZONE=Asia/Singapore
.
Local time zone
This is the client time zone. It is used for reading and writing DATE and TIME columns. It is set to your system time zone by default. But you can override it in your metadata, even for individual fields. See Date and Time Format and Time Zone for details.
Using Snowflake in Server core
If you want to use Snowflake connection in a graph that runs in Server Core, the server needs to be configured to allow reflective access. See reflective access.
Hive connection
A connection to the Apache Hive can be created exactly the same way as any other DB Connection. Here we make just a few Hive specific remarks you may find useful.
Hive JDBC driver
The JDBC driver is a part of the Hive release.
But the library and its dependencies are scattered among other Hive libraries.
Moreover, the driver depends on one more library from the Hadoop distribution: hadoop-core-*.jar
or hadoop-common-*.jar
, depending on version of your Hadoop, there’s always only one of them.
For Hive version 0.8.1, here is a minimal list of libraries you need for the Hive DB connection JDBC driver:
hadoop-core-0.20.205.jar hive-exec-0.8.1.jar hive-jdbc-0.8.1.jar hive-metastore-0.8.1.jar hive-service-0.8.1.jar libfb303-0.7.0.jar slf4j-api-1.6.1.jar slf4j-log4j12-1.6.1.jar
You can put all of the Hive distribution libraries + the one Hadoop lib on the JDBC driver classpath.
But some of the Hive distribution libraries may already be included in CloverDX which may result in class loading conflicts.
Typically, no commons-logging*
and log4j*
libraries should be included, otherwise (harmless) warnings will appear in a graph run log.
Using Hive in CloverDX transformation graphs
Remember that Hive is not an ordinary SQL relational database - it has its own SQL-like query language, called QL. Great resource about the Hive QL and Hive in general is the Apache Hive Wiki.
One of the consequences is that it makes no sense to use the DatabaseWriter component, because the INSERT INTO
statement can insert only results of the SELECT
query.
Even though it’s still possible to work around this, each CloverDX data record inserted using such INSERT
statement will results in a heavy-weigh MapReduce job which renders the component extremely slow.
Use LOAD DATA
Hive QL statement instead.
In the DBExecute component, always set the Transaction set attribute to One statement. The reason is that the Hive JDBC driver doesn’t support transactions, and attempt to use them would result in an error saying that the AutoCommit mode cannot be disabled.
Note that the append file operation is fully supported only since version 0.21.0 of HDFS.
Consequently, if you run Hive on top of older HDFS, you cannot append data to existing tables (use of the OVERWRITE
keyword becomes mandatory).
Troubleshooting
iSeries - (DB2/400)
It is possible that the JDBC driver to iSeries (DB2/400) returns untranslated EBDIC characters instead of characters in Unicode strings. The symptoms of this issue are fields read by CloverDX but not viewable.
To solve this issue, set the translate binary
property of the JDBC driver to true
.
See Toolbox JDBC.
Or you can modify the CCSID with CHGPF command. See Change Physical File (CHGPF).