Five Stars

Handling Huge XML files in Talend - OutOfMemoryError (tAdvancedFileOutputXML)

Hello all,

 

I am trying to generate a large xml file using tAdvancedFileOutputXML.

 

When running on local machine, i am getting an "OutOfMemoryError" : Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

 

Please see my configuration below + some screenshots of the corresponding job:

 

Talend version : 6.3.1 

 

Main input file (csv) : 88,151 rows

lookup file (csv) : 5,994,268 rows

 

sfg.PNG

 

 

1. To enable optimization, i have enabled "Store temp data" in the tMap lookup

Capturerr.PNG

 

2. I have changed the Generation mode to : "Fast with low memory consumption"

sdf.PNG

 

3. As my local machine has 8Gb RAM, i have also changed the JVM :

 

Captureff.PNG

 

4. I have also made use of the "output stream" option using an tJava component:
Capturep.PNG

 

And on tJava component contain : Capturert.PNG

 

 

Despite these settings, i am still not able to generate the Xml file and stuck with the OutOfMemory error.

 

Can you advice please? Thank you.

 

4 REPLIES
Seventeen Stars

Re: Handling Huge XML files in Talend - OutOfMemoryError (tAdvancedFileOutputXML)

It is actual never a good idea to create one huge xml file. The problem is not only the creation process, it is also the next part - reading such a huge file.

What about creating multiple files instead of one?

Five Stars

Re: Handling Huge XML files in Talend - OutOfMemoryError (tAdvancedFileOutputXML)

Hello Jlolling,

 

Thank you for your reply and I understand completely your idea.

 

I have tried splitting the xml into multiple files and it is much faster.

The problem is at the end, we will have to merge them to create one which will be handled by an application that can only accept one file during run-time.

 

The finally xml should be about 1.5 Gb.

 

Do you have any idea show can the actual job be optimized?

Seventeen Stars

Re: Handling Huge XML files in Talend - OutOfMemoryError (tAdvancedFileOutputXML)

Yes, but this task is much easier to do. Simply remove all root tags (aka let become them fragments) and join the files.

Five Stars

Re: Handling Huge XML files in Talend - OutOfMemoryError (tAdvancedFileOutputXML)

Hello,

 

Indeed the root tag should be removed.

I am new in using Talend, can you tell me how can this be implemented; i mean keeping only one root-tag in the file?

 

Also, despite the root-tag, i have tried to merge them using t Unite component, but it creates a blank row after each line which results in size increase : 

Captured.PNG