Version

    Supported URL Formats for File Operations

    URL attributes may be defined using the URL File Dialog.

    Unless explicitly stated otherwise, URL attributes of File Operation components accept multiple URLs separated with a semicolon (';').

    [Important]Important

    To ensure graph portability, forward slashes must be used when defining the path in URLs (even on Microsoft Windows).

    Most protocols support wildcards: ? (question mark) matches one arbitrary character; * (asterisk) matches any number of arbitrary characters. Note that wildcard support and their syntax is protocol-dependent.

    Below are some examples of possible URL for File Operations:

    Local Files

    • /path/filename.txt

      One specified file.

    • /path1/filename1.txt;/path2/filename2.txt

      Two specified files.

    • /path/filename?.txt

      All files satisfying the mask.

    • /path/*

      All files in the specified directory.

    • /path?/*.txt

      All .txt files in directories that satisfy the path? mask.

    Remote Files

    • ftp://username:password@server/path/filename.txt

      Denotes the path/filename.txt file on a remote server connected via an FTP protocol using username and password.

      If the initial working directory differs from the server root directory, please use absolute FTP paths, see below.

    • ftp://username:password@server/%2Fpath/filename.txt

      Denotes the /path/filename.txt file on a remote server - the initial slash must be escaped as %2F. The path is absolute with respect to the server root directory.

    • ftp://username:password@server/dir/*.txt

      Denotes all files satisfying the mask on a remote server connected via an FTP protocol using username and password.

    • sftp://username:password@server/path/filename.txt

      Denotes the filename.txt file on a remote server connected via an SFTP protocol using username and password.

    • sftp://username:password@server/path?/filename.txt

      Denotes all files filename.txt in directories satisfying the mask on a remote server connected via SFTP protocol using username and password.

    • http://server/path/filename.txt

      Denotes the filename.txt file on a remote server connected via an HTTP protocol.

    • https://server/path/filename.txt

      Denotes the filename.txt file on a remote server connected via an HTTPS protocol.

    • s3://access_key_id:secret_access_key@s3.amazonaws.com/bucketname/path/filename.txt

      Denotes the path/filename.txt object located in Amazon S3 web storage service in a bucket bucketname. The connection is established using the specified access key ID and secret access key.

    • hdfs://CONNECTION_ID/path/filename.txt

      Denotes the filename.txt file on Hadoop HDFS. The "CONNECTION_ID" stands for the ID of a Hadoop connection defined in a graph.

    • smb://domain%3Buser:password@server/path/filename.txt

      smb2://domain%3Buser:password@server/path/filename.txt

      Denotes a file located in Windows share (Microsoft SMB/CIFS protocol). The URL path may contain wildcards (both * and ? are supported). The server part may be a DNS name, an IP address or a NetBIOS name. The Userinfo part of the URL (domain%3Buser:password) is not mandatory and any URL reserved character it contains should be escaped using the %-encoding similarly to the semicolon ; character with %3B in the example (the semicolon is escaped because it collides with the default CloverDX file URL separator).

      The SMB version 1 protocol is implemented in the JCIFS library which may be configured using Java system properties. See Setting Client Properties in JCIFS documentation for a list of all configurable properties.

      The SMB version 2 and 3 protocol is implemented in the SMBJ library.

    Sandbox Resources

    A sandbox resource, whether it is a shared, local or partitioned sandbox, is specified in a graph under the fileURL attributes as a so called sandbox URL like this:

    sandbox://data/path/to/file/file.dat

    where data is a code for sandbox and path/to/file/file.dat is the path to the resource from the sandbox root. A graph does not have to run on the node which has local access to the resource.