tFileInputXML Component does not extract element values within a Spark Job

Talend Version (Required)       6.3.1

Summary

 
Additional Versions  
Product (Required) Big Data
Component (Required) Studio Spark
Problem Description

A Talend 6.3.1 Spark Job contains a tFileInputXML component to extract XML element values (for instance, here, ID) within element (Incident) that has an Attribute (Active) from an XML document:

 

<Incident Active="true">
<ID>Incident2017</ID>
<AssignmentGroup>FoundationTeam</AssignmentGroup>
<CommentsCount>0</CommentsCount>
<CompanyName>My Company</CompanyName>
..
</Incident>

 

The expected behavior is that tFileInputXML component extracts the Incident2017 value for the ID element. The problem is that the element values extracted by the tFileInputXML component are null values when executing a Spark Job.

 

When you remove the Active attribute of the Incident element, then the element values (here, Incident2017, FoundationTeam, and My Company) can be extracted correctly with the tFileInputXML component in a Spark job.

 

This issue does not occur when executing tFileInputXML component within a Standard Job.

Problem root cause  
Solution or Workaround This issue is fixed with Talend 6.4.1 and 6.3.2 versions. For Talend 6.2.1, the issue is solved by applying patch Patch_20170522_TPS-1949_v1-6.3.1.zip.
JIRA ticket number TBD-4903
Version history
Revision #:
5 of 5
Last update:
‎11-14-2017 03:23 PM
Updated by:
 
Contributors