How to set a field separator for a Dynamic type column when using a tHDFSOutput component

Talend Version       6.4.1

Summary

 
Additional Versions  
Product Big Data
Component Studio
Problem Description

A Talend 6.4.1 Job consists of reading data from a database (with a tOracleInput component) and writing data into an HDFS file (with a tHDFSOutput component). The tOracleInput component is using schema Dynamic type for getting data from generic SQL queries. tHDFSOutput does not support the Dynamic type as shown below:

Message.png

 

So, the tMap component is used to convert the Dynamic type column to an Object type column:

separator_job_1.png

 

However, the problem is that when writing data to an HDFS file with the tHDFSOutput component, the field separator string is not taken into account (in the screenshot above, the tHDFSOutput field separator is ';'). The field separator is always '-' when writing rows into HDFS files:

"
7369 - SMITH - CLERK - 7902 - 17/12/1980 - 800 - null - 20
7499 - ALLEN - SALESMAN - 7698 - 20/02/1981 - 1600 - 300 - 30
7521 - WARD - SALESMAN - 7698 - 22/02/1981 - 1250 - 500 - 30
"

 

So, the question is how to write data into HDFS with a field separator other than '-'.

Problem root cause

The default field separator for the Dynamic type is '-'. By design, this field separator cannot be customized in Talend components, in particular the tHDFSOutput component.

Solution or Workaround

The workaround consists of:

  • Using the tMap component to map the Dynamic type column to a String by specifying a field separator using the toString (String separator) method of the dynamic class (Dynamic.java).
  • Using the tHDFSOutput component to write the output String row from tMap into HDFS.

To illustrate this solution, suppose that the desired field separator is '|'.

 

The screenshot below shows the tMap component mapping that can be done:

sep_mapping.png

 

The chosen field separator is '|' and it is passed to the tostring method as a parameter: row1.newColumn.toString("|") in the tMap component mapping.

 

As a result, the String rows taken from tMap output will be written into an HDFS file with '|' as a field separator by the tHDFSOutput component:

"
7369|SMITH|CLERK|7902|17/12/1980|800|null|20
7499|ALLEN|SALESMAN|7698|20/02/1981|1600|300|30
7521|WARD|SALESMAN|7698|22/02/1981|1250|500|30
"
JIRA ticket number  
Version history
Revision #:
28 of 28
Last update:
‎02-09-2018 06:27 PM
Updated by:
 
Contributors