Five Stars

tFileOutputExcel : issue with "big" files in format xlsx

Hi all,
I try to make a quite simple job :
tOralceInput --> tFileOutputExcel
the tFileOutputExcel should write a xlsx file, and I've something like 100'000 or 200'000 rows only.
However, I couldn't find a way to make this work!
Without any specific settings, I've heap space errors :
 disconnected
Exception in thread "main" java.lang.Error: java.lang.OutOfMemoryError: Java heap space
at ....tOracleInput_1Process(....java:1780)
at ....runJobInTOS(....java:5109)
at ....main(....java:4854)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.io.ByteArrayOutputStream.write(Unknown Source)
at org.apache.poi.openxml4j.opc.internal.MemoryPackagePartOutputStream.write(MemoryPackagePartOutputStream.java:88)
at org.apache.xmlbeans.impl.store.Cursor._save(Cursor.java:590)
at org.apache.xmlbeans.impl.store.Cursor.save(Cursor.java:2544)
at org.apache.xmlbeans.impl.values.XmlObjectBase.save(XmlObjectBase.java:180)
at org.apache.poi.xssf.usermodel.XSSFSheet.write(XSSFSheet.java:2480)
at org.apache.poi.xssf.usermodel.XSSFSheet.commit(XSSFSheet.java:2439)
at org.apache.poi.POIXMLDocumentPart.onSave(POIXMLDocumentPart.java:196)
at org.apache.poi.POIXMLDocumentPart.onSave(POIXMLDocumentPart.java:200)
at org.apache.poi.POIXMLDocument.write(POIXMLDocument.java:204)
at org.talend.ExcelTool.writeExcel(ExcelTool.java:271)
at ....tOracleInput_1Process(....java:1738)
... 2 more

And when I try to use an output stream, job never ends...
I also try to use cursor on the tOracleInput but it crashes as well.
Any idea I could try to make this works?
Thanks
Regards
Alexis
20 REPLIES
One Star

Re: tFileOutputExcel : issue with "big" files in format xlsx

Hi
Which version of Tos do you use?
Recently some users reported that tFileOutputExcel costed too much memory.
I doubt maybe it's a bug.
Besides, do you check the option "appen existing file"?
Regards,
Pedro
Five Stars

Re: tFileOutputExcel : issue with "big" files in format xlsx

I use TIS 5.0.2
and no, I didn't check this option
One Star

Re: tFileOutputExcel : issue with "big" files in format xlsx

Hi
Before we make sure it's a bug, you'd better increase the JVM arguments.
Run tag->Advanced Settings->Use specific JVM arguments.
Regards,
Pedro
Five Stars

Re: tFileOutputExcel : issue with "big" files in format xlsx

Actually, job is running on a jobserver: could you please help me to find out where I can configure JVM on job server?
Thanks
One Star

Re: tFileOutputExcel : issue with "big" files in format xlsx

Hi
You might increase the JVM argument for each job in TAC.
Regards,
Pedro
Five Stars

Re: tFileOutputExcel : issue with "big" files in format xlsx

Ok, I'll try that
However, is it normal for a simple task to take so much memory?
One Star

Re: tFileOutputExcel : issue with "big" files in format xlsx

Hi
It's werid that tFileOutputExcel cost so much memory for recent version.
But I didn't reproduce this issues. They are all reported by users.
Please try to increase the JVM argument.
If it doesn't work, please report it on BugTracker.
Regards,
Pedro
Five Stars

Re: tFileOutputExcel : issue with "big" files in format xlsx

Did you try a very simple test:
tRowGenerator (150 000 rows with 5 or 6 random Ascii strings) ---> tFileOutputExcel (xslx format)
On my computer (directly runned from studio), it crashes:
Démarrage du job test a 14:47 01/08/2012.
connecting to socket on port 4064
connected
Exception in thread "main" java.lang.Error: java.lang.OutOfMemoryError: Java heap space
at transverse.test_0_1.test.tRowGenerator_1Process(test.java:804)
at transverse.test_0_1.test.runJobInTOS(test.java:1067)
at transverse.test_0_1.test.main(test.java:850)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.xmlbeans.impl.store.Saver$TextSaver.resize(Saver.java:1592)
disconnected
at org.apache.xmlbeans.impl.store.Saver$TextSaver.preEmit(Saver.java:1223)
at org.apache.xmlbeans.impl.store.Saver$TextSaver.emit(Saver.java:1144)
at org.apache.xmlbeans.impl.store.Saver$TextSaver.emitElement(Saver.java:926)
at org.apache.xmlbeans.impl.store.Saver.processElement(Saver.java:456)
at org.apache.xmlbeans.impl.store.Saver.process(Saver.java:307)
at org.apache.xmlbeans.impl.store.Saver$TextSaver.saveToString(Saver.java:1727)
at org.apache.xmlbeans.impl.store.Cursor._xmlText(Cursor.java:546)
at org.apache.xmlbeans.impl.store.Cursor.xmlText(Cursor.java:2436)
at org.apache.xmlbeans.impl.values.XmlObjectBase.xmlText(XmlObjectBase.java:1455)
at org.apache.xmlbeans.impl.values.XmlObjectBase.toString(XmlObjectBase.java:1440)
at org.apache.poi.xssf.model.SharedStringsTable.addEntry(SharedStringsTable.java:167)
at org.apache.poi.xssf.usermodel.XSSFCell.setCellValue(XSSFCell.java:345)
at org.apache.poi.xssf.usermodel.XSSFCell.setCellValue(XSSFCell.java:315)
at org.talend.ExcelTool.addCellValue(ExcelTool.java:250)
at transverse.test_0_1.test.tRowGenerator_1Process(test.java:733)
... 2 more
Job test terminé à 14:54 01/08/2012.

For information, Xmx was at 1024M
One Star

Re: tFileOutputExcel : issue with "big" files in format xlsx

Hi
Yes. See the following image. It works fine.
If this issue recurs every time for your Talend Studio, please report it on BugTracker.
Regards,
Pedro
Five Stars

Re: tFileOutputExcel : issue with "big" files in format xlsx

And you do extract on xlsx format and with Xmx1024 ?
One Star

Re: tFileOutputExcel : issue with "big" files in format xlsx

Hi
Yes. XMX1024M.
Regards,
Pedro
Five Stars

Re: tFileOutputExcel : issue with "big" files in format xlsx

is it possible for you to send me the code of your job plz?
One Star

Re: tFileOutputExcel : issue with "big" files in format xlsx

Hi
I have sent you an email and attach the code of my job.
Regards,
Pedro
Five Stars

Re: tFileOutputExcel : issue with "big" files in format xlsx

Hi, thanks for your code
Do you think the difference could be related to your "big data" version of TIS?
One Star

Re: tFileOutputExcel : issue with "big" files in format xlsx

Hi
No. I have tried to create this job under several versions of TOS and TIS.
I think you'd better report it on BugTracker.
Regards,
Pedro
Five Stars

Re: tFileOutputExcel : issue with "big" files in format xlsx

Ok, last question: did you try with multiple columns? I managed to run the job with only one column as in your own job.
One Star

Re: tFileOutputExcel : issue with "big" files in format xlsx

Hi
How many columns?
If this error only occurs when you add two many columns, it is because the job exceeds the JVM memory.
It's not a bug.
Regards,
Pedro
Five Stars

Re: tFileOutputExcel : issue with "big" files in format xlsx

5 columns
The point is that I don't understand why it takes up to 1GiB and 419s to fail to write to xlsx while it takes less than 2 seconds to write the same to a csv and less than 5 seconds and 87 MiB in memory for an xml ?
One Star

Re: tFileOutputExcel : issue with "big" files in format xlsx

Hi
I don't understand, yet.
Sorry. I try to reproduce this issue but all failed.
Regards,
Pedro
Five Stars

Re: tFileOutputExcel : issue with "big" files in format xlsx

ok, thanks for your time...
I think I'll try with v5.1.1 and if it still fails, I'll try to open a bug