tExtractXMl component in a Spark job

Five Stars

tExtractXMl component in a Spark job

The tExtractXMl compoent in a spark job is complaining with the following error for the  below section of code .Unable to build the job.


Error message: "the code of method call (Tuple2<NullWritable,row9Struct>) is exceeding the 65535 byte limit" 


public java.util.Iterator<scala.Tuple2<NullWritable, row3Struct>> call(
scala.Tuple2<NullWritable, row9Struct> data)
throws java.lang.Exception {
java.util.List<scala.Tuple2<NullWritable, row3Struct>> outputs = new java.util.ArrayList<scala.Tuple2<NullWritable, row3Struct>>();
row3Struct row3 = new row3Struct();
row9Struct row2 = data._2;


PLease help me understand this issue .

Thirteen Stars

Re: tExtractXMl component in a Spark job



65535 it is a limit for java method code (maybe I not 100% correct describe, but it is knowing error)


source of error could be a complicated structure (with long XPath and many columns)

there are no single solution, but often possible resolve it if:

  • exclude not used tags (if any)
  • split into several steps (if possible) - parse half, then next half, then join


Five Stars

Re: tExtractXMl component in a Spark job

Thank you very much. Your sugggestion worked. I had 498 columns with xpath. reduced it to 300 columns and that worked.


Thanks ONce again


Badri Nair

What’s New for Talend Spring ’19

Join us live for a sneak peek!

Sign up now

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.



Introduction to Talend Open Studio for Data Integration.


Downloads and Trials

Test drive Talend's enterprise products.