Supported file URL formats for Readers
The File URL attribute may be defined using the URL File Dialog.
|
To ensure graph portability, forward slashes must be used when defining the path in URLs (even on Microsoft Windows). |
Below are examples of possible URL for Readers:
Reading of local files
-
/path/filename.txtReads a specified file.
-
/path1/filename1.txt;/path2/filename2.txtReads two specified files.
-
/path/filename?.txtReads all files satisfying the mask.
-
/path/*Reads all files in a specified directory.
-
zip:(/path/file.zip)Reads the first file compressed in the
file.zipfile. -
zip:(/path/file.zip)!innerfolder/filename.txtReads a specified file compressed in the
file.zipfile.
|
Path separator character |
-
gzip:(/path/file.gz)Reads the first file compressed in the
file.gzfile. -
tar:(/path/file.tar)!innerfolder/filename.txtReads a specified file archived in the
file.tarfile. -
zip:(/path/file??.zip)!innerfolder?/filename.*Reads all files from the compressed zip file(s) that satisfy the specified mask. Wild cards (
?and*) may be used in the compressed file names, inner folder and inner file names. -
tar:(/path/file????.tar)!innerfolder??/filename*.txtReads all files from the archive file(s) that satisfy the specified mask. Wild cards (
?and*) may be used in the compressed file names, inner folder and inner file names. -
gzip:(/path/file*.gz)Reads all files that has been gzipped into the file that satisfy the specified mask. Wild cards may be used in the compressed file names.
-
tar:(gzip:/path/file.tar.gz)!innerfolder/filename.txtReads a specified file compressed in the
file.tar.gzfile.
|
Although CloverDX can read data from a |
-
tar:(gzip:/path/file??.tar.gz)!innerfolder?/filename*.txtReads all files from the gzipped
tararchive file(s) that satisfy the specified mask. Wild cards (?and*) may be used in the compressed file names, inner folder and inner file names. -
zip:(zip:(/path/name?.zip)!innerfolder/file.zip)!innermostfolder?/filename*.txtReads all files satisfying the file mask from all paths satisfying the path mask from all compressed files satisfying the specified zip mask. Wild cards (
?and*) may be used in the outer compressed files, innermost folder and innermost file names. They cannot be used in the inner folder and inner zip file names.
Reading of remote files
-
ftp://username:password@server/path/filename.txtReads a specified
filename.txtfile on a remote server connected via an FTP protocol using username and password. -
sftp://username:password@server/path/filename.txtReads a specified
filename.txtfile on a remote server connected via an SFTP protocol using a username and password.If a certificate-based authentication is used, certificates are placed in the
${PROJECT}/ssh-keys/directory. For more information, see SFTP certificate in CloverDX.The certificate-based authentication has a URL without a password:
sftp://username@server/path/filename.txt -
http://server/path/filename.txtReads a specified
filename.txtfile on a remote server connected via an HTTP protocol. -
https://server/path/filename.txtReads a specified
filename.txtfile on a remote server connected via an HTTPS protocol. -
zip:(ftp://username:password@server/path/file.zip)!innerfolder/filename.txtReads a specified
filename.txtfile compressed in thefile.zipfile on a remote server connected via an FTP protocol using username and password. -
zip:(http://server/path/file.zip)!innerfolder/filename.txtReads a specified
filename.txtfile compressed in thefile.zipfile on a remote server connected via an HTTP protocol. -
tar:(ftp://username:password@server/path/file.tar)!innerfolder/filename.txtReads a specified
filename.txtfile archived in thefile.tarfile on a remote server connected via an FTP protocol using username and password. -
zip:(zip:(ftp://username:password@server/path/name.zip)!innerfolder/file.zip)!innermostfolder/filename.txtReads a specified
filename.txtfile compressed in thefile.zipfile that is also compressed in thename.zipfile on a remote server connected via an FTP protocol using username and password. -
gzip:(http://server/path/file.gz)Reads the first file compressed in the
file.gzfile on a remote server connected via an HTTP protocol. -
http://server/filename*.datReads all files from a WebDAV server which satisfy specified mask (only * is supported).
-
s3://access_key_id:secret_access_key@s3.amazonaws.com/bucketname/filename*.outReads all objects which satisfy the specified mask from an Amazon S3 web storage service from a given bucket using access key ID and a secret access key.
It is recommended to connect to S3 via region-specific S3 URL:
s3://s3.eu-central-1.amazonaws.com/bucket.name/. The region-specific URL has much better performance than the generic one (s3://s3.amazonaws.com/bucket.name/).See recommendation on Amazon S3 URL.
s3:// URL protocol is available since CloverETL 4.1. More information about the deprecated http:// S3 protocol can be found in CloverDX 4.0 User Guide.
-
az-blob://account:account_key@account.blob.core.windows.net/containername/path/filename*.txtReads all objects matching the specified mask from the specified container in Microsoft Azure Blob Strage service.
Connects using the specified Account Key. See Azure Blob Storage for other authentication options.
-
hdfs://CONN_ID/path/filename.datReads a file from the Hadoop distributed file system (HDFS). To which HDFS NameNode to connect to is defined in a Hadoop connection with
CONN_ID. This example file URL reads a file with the/path/filename.databsolute HDFS path. -
smb://domain%3Buser:password@server/path/filename.txtReads files from Windows share (Microsoft SMB/CIFS protocol) version 1. The URL path may contain wildcards (both * and ? are supported). The
serverpart may be a DNS name, an IP address or a NetBIOS name. The Userinfo part of the URL (domain%3Buser:password) is not mandatory and any URL reserved character it contains should be escaped using the %-encoding similarly to the semicolon;character with%3Bin the example (the semicolon is escaped because it collides with the default CloverDX file URL separator).The SMB protocol is implemented in the JCIFS library which may be configured using Java system properties. See Setting client properties in the JCIFS documentation for the list of all configurable properties.
-
smb2://domain%3Buser:password@server/path/filename.txtReads files from Windows share (Microsoft SMB/CIFS protocol) version 2 and 3.
The SMB2 protocol is implemented in the SMBJ library.
Due to the upgrade of the SMBJ library in CloverDX version 6.2, anonymous access using SMB protocol version 2 or 3 will no longer work unless your Samba server is configured to stop requiring message signing. If turning off message signing is not an option, you can create a user without a password to use in the URL as a workaround. See below for example URLs:smb2://domain%3Buser@server/path/filename.txtsmb2://domain%3Buser:@server/path/filename.txt
|
Reading from input port
-
port:$0.FieldName:discreteEach data record field from input port represents one particular data source.
-
port:$0.FieldName:sourceEach data record field from an input port represents a URL to be loaded in and parsed.
-
port:$0.FieldName:streamInput port field values are concatenated and processed as an input file(s);
nullvalues are replaced by theeof.
See also Input port reading.
Using proxy in Readers
-
http:(direct:)//seznam.czWithout proxy.
-
http:(proxy://user:password@212.93.193.82:443)//seznam.czProxy setting for HTTP protocol.
-
ftp:(proxy://user:password@proxyserver:1234)//seznam.czProxy setting for FTP protocol.
-
sftp:(proxy://66.11.122.193:443)//user:password@server/path/file.datProxy setting for SFTP protocol.
-
s3:(proxy://user:password@66.11.122.193:443)//access_key_id:secret_access_key@s3.amazonaws.com/bucketname/filename*.datProxy setting for S3 protocol.
Reading from dictionary
Sandbox resource as data source
A sandbox resource, whether it is a shared, local or partitioned sandbox, is specified in the graph under the fileURL attributes as a so called sandbox URL like this:
sandbox://data/path/to/file/file.dat
where data is a code for sandbox and path/to/file/file.dat is the path to the resource from the sandbox root.
The URL is evaluated by CloverDX Server during graph execution and a component (reader or writer) obtains the opened stream from the Server.
This may be a stream to a local file or to some other remote resource.
Thus, a graph does not have to run on the node which has local access to the resource.
There may be more sandbox resources used in the graph and each of them may be on a different node.
In such cases, CloverDX Server would choose the node with the most local resources to minimize remote streams.
The sandbox URL has a specific use for parallel data processing. When the sandbox URL with the resource in a partitioned sandbox is used, that part of the graph/phase runs in parallel, according to the node allocation specified by the list of partitioned sandbox locations. Thus, each worker has its own local sandbox resource. CloverDX Server evaluates the sandbox URL on each worker and provides an open stream to a local resource to the component.