Java Heap Space Issue in Talend while processing large xlsx file

One Star

Java Heap Space Issue in Talend while processing large xlsx file

Hello All,
I have huge xlsx file with having more than 700000 records and more than 100 columns in it.
I have to read the file and load it into the table but I am facing Java Heap Issue.
Can some one please let me know what is the limitation of Talend and upto what size of records I can process in Talend using xlsx file
and also size of file matters to cause the java heap space issue. and if it is then what would be the maximum size allowed.
Basically what is the limitation of Talend for processing large xlsx file and what is the solution for it.
Thanking you,
with kind regards,
Akash Sonawane.
Seventeen Stars

Re: Java Heap Space Issue in Talend while processing large xlsx file

hi,
did you increase jvm memory allocation (ram) ?
regards
laurent
One Star

Re: Java Heap Space Issue in Talend while processing large xlsx file

Hi,
Have you tried changing the 'Generation Mode' option in the Advanced settings of the tFileInputExcel component to 'Less memory consumed for large excel(Event mode)'? That fixed a similar issue i had.
Regards, 
Aled 
Seventeen Stars

Re: Java Heap Space Issue in Talend while processing large xlsx file

The generation mode is a hack from talend and I would not use it in a productive job.
The Apache guys developing the Apache POI API has faced the problem with large Excel files and created a industrial proven solution (for xlsx typed files) called StreamingWorkbook. This kind of workbook saves memory while your read and write Excel files.
The component tFileExcelWorkbookOpen provides this feature. This will solve the memory problems for sure and for read and write operations.
Reading and writing will be done with the components tFileExcelSheetInput and tFileExcelSheetOutput.
http://www.talendforge.org/exchange/index.php?eid=1254&product=tos&action=view&nav=0,1,1
http://www.talendforge.org/exchange/index.php?eid=1256&product=tos&action=view&nav=0,1,1
http://www.talendforge.org/exchange/index.php?eid=1257&product=tos&action=view&nav=0,1,1
Please take a look into the documentation.
There is a backdraft: The StreamingWorkbook holds only (per default) 100 records within the memory and therefore some complex operations like applying changes over a large couple of records at once will not work or reading/writing in more than one sheet in the same job cannot work.