I have a problem using in a simply job how I write the result of a query in an Excel file.
If in the component I not set "Add to existing file" it works.
If I set "Add to existing file" I have this error:
Exception in thread "main" java.lang.: Java heap space
I tried to increase the memory value Xmx in the advanced setting of Run but without success.
So the data you want to write the files are so many, about 300k row and the dimension of the file Excel is 26Mb.
I work in this environment:
Lubuntu virtual machine 64bit
4Gb of memory
java version "1.8.0_102"
Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
It it possible to resolve this problem?
I use "Add to existing file" not because I have data to merge.
It is a trick to use an Excel file model pre formatted with friezed row, auto-filter, header of column in bold etc.
The pre formatted file is empty and only 6KiB.
Merge into a tMap non solve my case.
I'll explain the behaviour with more detail.
1. in the component tFileOutputExcel if I not check the flag "Add to existing file" it works; It generates an Excel file of about 300,000 row and 30MiB;
2. if I set "Add to existing file" I have the memory error around the writing of the 50th row; the existing file is very small (6KiB) and I was expecting to have a file of 30MiB + 6KiB, really with no significant difference to the previous point.
I think the component tFileOutputExcel works this way:
Flag not checked: direct writing of rows to file -> no problems;
Flag checked: storing rows in memory and writing to file at the end of the recordset -> memory error.
I increased the memory of my virtual machine from 2MiB to 4MiB and I increased the -Xmx run parameter of my job from 1024 to 4096 but without success.
How can i fix this error?
What if you reverse the operation order:
1- write the result without the option "Add to existing file" checked but with the "Is absolute Y pos." option checked and the appropriate values for the "First cell X" and "First cell Y" fields + "Keep existing cell format".
2- add the missing rows (header?) from he original file.
I suppose your actual process is to copy the original file to the result file, then use the result file to append the records from the database.
With the change I propose, you need to write directly to the result file, then read the original file and write at an absolute position to inject the header.
Does this matches with what you expect?
Your solution non works.
I have a memory error just after 3727 rows processed.
I think it's not a good way to try to turn around the problem.
I have RAM memory available, we have to understand how to use It.
It looks very strange as less than 4,000 rows is not a very small value.
Can you share a capture of the whole job?
Also, you can try by replacing tFileOutputExcel by tFileOutputDelimited, then when the job finish, convert the resulting csv file to xlsx (using tFileInputDelimited and tFileOutputExcel).
It is obvious that if I use tFileOutputDelimited it works, do not you think so?
I have the problem if I check the flag "Add to existing file" in the component fFileOutputExcel.
If you want to try it is very simple.
You have to create a job with a data input ("Select * From ..." what you want) that it produces a recordset of approximately 200,000 rows.
If you prefer, you can use the tFileInputDelimited component in a file with the same number of rows.
In the fFileOutputExcel you have to check "Write excel2007" because only this format is able to handle a file with many lines.
At the first run you have to NOT check "Add to existing file" and you can see that it works.
Then you must clear all rows except the first (header) with Excel or similarly.
Now you check "Add to existing file" and you run the job again.
I think that in this way you can generate the error.
I've just tried with 250,000 lines and it works fine with an elapse time less than 1'.
The generated file is 21 Mb large.
My config is 8Gb RAM with standard JVM parameters (-Xms250M -Xmx1024M).
I've also tried to add 250,000 new lines after the 250,000 existing lines and this time it failed (yep!) with a Java heap space exception (so for a total of 500,000 lines).
Now, if I delete (not just clear) all the lines but the 1st oine and retry... it works again.
Are you sure the way you use to delete the existing content from the file during the test?
In case of, try to select all the lines, then right click and delete.
Also verify where is the last line (Ctrl+Fn+End). It should be the 1st line after deletion of the content and saving the file.
Else, I can't figure out why it doesn't works for you (and I'm very sorry to cannot solve your case), but continue to share your experiments.
I'm very sure the file haven't more than one row.
Yet I still have the error:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at org.apache.xmlbeans.impl.store.Saver$TextSaver.resize(Saver.java:1700) at org.apache.xmlbeans.impl.store.Saver$TextSaver.preEmit(Saver.java:1303) at org.apache.xmlbeans.impl.store.Saver$TextSaver.emit(Saver.java:1190) at org.apache.xmlbeans.impl.store.Saver$TextSaver.emitElement(Saver.java:962) at org.apache.xmlbeans.impl.store.Saver.processElement(Saver.java:476) at org.apache.xmlbeans.impl.store.Saver.process(Saver.java:307) at org.apache.xmlbeans.impl.store.Saver$TextSaver.saveToString(Saver.java:1864) at org.apache.xmlbeans.impl.store.Cursor._xmlText(Cursor.java:546) at org.apache.xmlbeans.impl.store.Cursor.xmlText(Cursor.java:2436) at org.apache.xmlbeans.impl.values.XmlObjectBase.xmlText(XmlObjectBase.java:1500) at org.apache.poi.xssf.model.SharedStringsTable.getKey(SharedStringsTable.java:134) at org.apache.poi.xssf.model.SharedStringsTable.addEntry(SharedStringsTable.java:180) at org.apache.poi.xssf.usermodel.XSSFCell.setCellValue(XSSFCell.java:362)
In the last test I have increased the swap memory to disk to have the same value as physical memory (4Gb+4Gb)... unfortunately still unsuccessful.
I'm pretty sure the problem is in the component.
If I do not select "Add to existing file" in the advanced option I can set the number of row to the buffer.
I haven't this possibility if I select "Add to existing file"; when it runs store in the memory all the data until the end of the record-set but this way it uses all the memory available as long as it ends.
Instead with not select "Add to existing file", it writes data every XXX rows and releases memory for new data; I can see this because the size of the file it increase on the disk during processing.
I not think there is a solution unless you modify the component or find a way to configure the memory usage.
Well, I would like to be able to modify the component by myself, but I'm not from Talend...
Any chance to test with a PC with 8Gb RAM?
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Pick up some tips and tricks with Context Variables
Learn how media organizations have achieved success with Data Integration
Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend