HadoopWriter writes data into Hadoop sequence files.
|Component||Data output||Input ports||Output ports||Transformation||Transf. required||Java||CTL||Auto-propagated metadata|
|HadoopWriter||Hadoop sequence file||1||0|
|Input||0||For input data records||Any|
HadoopWriter does not propagate metadata.
HadoopWriter has no metadata template.
Hadoop connection with Hadoop libraries
containing the Hadoop sequence file writer implementation.
If the Hadoop connection ID is specified in a
|Hadoop connection ID|
A URL to an output file on HDFS or a local file system.
URLs without a protocol (i.e. absolute or relative path) or with the
If the output file should be located on the HDFS, use the URL in form of
|Key field||The name of an input record field carrying a key for each written key-value pair.|
|Value field||The name of an input record field carrying a value for each written key-value pair.|
|Create empty files|
If set to
|true (default) | false|
HadoopWriter writes data into a special Hadoop sequence file
These files contain key-value pairs and are used in MapReduce jobs as input/output file formats.
The component can write a single file as well as a partitioned file
which has to be located on HDFS or a local file system.
The exact version of the file format created by the HadoopWriter component depends on Hadoop libraries which you supply in the Hadoop connection referenced from the File URL attribute. In general, sequence files created by one version of Hadoop may not be readable by different version.
When writing to a local file system, additional
.crc files are created
if the Hadoop connection with default settings is used.
That is because, by default, Hadoop interacts with a local file system
which creates checksum files for each written file.
When reading such files, checksum is verified.
You can disable checksum creation/verification by adding this key-value pair
in the Hadoop Parameters of the
For technical details about Hadoop sequence files, see Apache Hadoop Wiki.
Notes and Limitations
Currently, writing compressed data is not supported.
HadoopWriter cannot write lists and maps.
If you write data to a sequence file on a local file system, you may encounter the following error message in the error log:
Cannot run program "chmod": CreateProcess error=2, The system cannot find the file specified
Cannot run program "cygpath": CreateProcess error=2, The system cannot find the file specified
To solve this problem, disable checksum creation/verification using the
Hadoop parameter in Hadoop connection configuration.
This issue is related to non-POSIX operating systems (MS Windows).