HadoopReader reads Hadoop sequence files.
|Component||Data source||Input ports||Output ports||Each to all outputs||Different to different outputs||Transformation||Transf. req.||Java||CTL||Auto-propagated metadata|
|HadoopReader||Hadoop Sequence File||0–1||1|
|Input||0||For Input Port Reading. Only the ||Any|
|Output||0||For read data records.||Any|
HadoopReader does not propagate metadata.
HadoopReader has no metadata template.
Hadoop connection with Hadoop libraries
containing the Hadoop sequence file parser implementation.
If the Hadoop connection ID is specified in a
|Hadoop connection ID|
A URL to a file on HDFS or a local file system.
URLs without a protocol (i.e. absolute or relative path)
or with the
If the file to be read is located on the HDFS,
use the URL in this form:
|Key field||The name of an output edge record field, where a key of each key-value pair will be stored.|
|Value field||The name of an output edge record field, where a value of each key-value pair will be stored.|
HadoopReader reads data from a special Hadoop sequence file
These files contain key-value pairs and are used in MapReduce jobs as input/output file formats.
The component can read a single file as well as a collection of files
which have to be located on HDFS or local file system.
If you connect to local sequence files, there is no need to connect to a Hadoop cluster. However, you still need a valid Hadoop connection (with a correct version of libraries).
The exact version of the file format supported by the HadoopReader component depends on Hadoop libraries which you supply in the Hadoop connection referenced from the File URL attribute. In general, sequence files created by one version of Hadoop may not be readable by a different version.
Hadoop sequence files may contain compressed data. HadoopReader automatically detects this and decompresses the data. Remember that supported compression codecs depend on libraries you specify in the Hadoop connection.
For technical details about Hadoop sequence files, see Apache Hadoop Wiki.
Reading data from local sequence files
Read records from a Hadoop Sequence file
The file has
ProductID as a key and
ProductName as a value.
Create a valid Hadoop connection or use existing one. See Hadoop connection.
Use the Hadoop connection, File URL, Key field and Key value attributes.