Four Stars

tFileOutputExcel memory error

Hi forum

I have a problem using in a simply job how I write the result of a query in an Excel file.

OutOfMemory.jpg

If in the component I not set "Add to existing file" it works.

If I set "Add to existing file" I have this error:

Exception in thread "main" java.lang.: Java heap space

I tried to increase the memory value Xmx in the advanced setting of Run but without success.

So the data you want to write the files are so many, about 300k row and the dimension of the file Excel is 26Mb.

I work in this environment:

Lubuntu virtual machine 64bit

4Gb of memory

java version "1.8.0_102"
Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)

 

It it possible to resolve  this problem?

Regards

Alessandro

 

  • Data Integration
11 REPLIES
Nine Stars TRF
Nine Stars

Re: tFileOutputExcel memory error

Hi,
What if you try just to read the Excel file using tFileInputExcel?
If it works, try to redesign the job with tFileInputExcel and tMSSQLInput merged into a tMap.
Not it will solve your case, but you can try.

TRF
Four Stars

Re: tFileOutputExcel memory error

I use "Add to existing file" not because I have data to merge.

It is a trick to use an Excel file model pre formatted with friezed row, auto-filter, header of column in bold etc.

The pre formatted file is empty and only 6KiB.

Merge into a tMap non solve my case.

Regards

Alessandro

Nine Stars TRF
Nine Stars

Re: tFileOutputExcel memory error

OK, I though the file was already a huge one.
What if you limit the number of selected rows to 100k more or less (to check if the problem is due to the output size)?
What if you redirect output to a tFileOutputDelimited component (to check if the problem is due to excel file size)?

TRF
Four Stars

Re: tFileOutputExcel memory error

Hi TRF

I'll explain the behaviour with more detail.

1. in the component tFileOutputExcel if I not check the flag "Add to existing file" it works; It generates an Excel file of about 300,000 row and 30MiB;

2. if I set "Add to existing file" I have the memory error around the writing of the 50th row; the existing file is very small (6KiB) and I was expecting to have a file of 30MiB + 6KiB, really with no significant difference to the previous point.

 

I think the component tFileOutputExcel works this way:

Flag not checked: direct writing of rows to file -> no problems;

Flag checked: storing rows in memory and writing to file at the end of the recordset -> memory error.

I increased the memory of my virtual machine from 2MiB to 4MiB and I increased the -Xmx run parameter of my job from 1024 to 4096 but without success.

How can i fix this error?

Regards

Alessandro

 

Nine Stars TRF
Nine Stars

Re: tFileOutputExcel memory error

Hi Alessandro,

 

What if you reverse the operation order:
1- write the result without the option "Add to existing file" checked but with the "Is absolute Y pos." option checked and the appropriate values for the "First cell X" and "First cell Y" fields + "Keep existing cell format".

2- add the missing rows (header?) from he original file.

I suppose your actual process is to copy the original file to the result file, then use the result file to append the records from the database.

With the change I propose, you need to write directly to the result file, then read the original file and write at an absolute position to inject the header.

Does this matches with what you expect?


TRF
Four Stars

Re: tFileOutputExcel memory error

Hi TRF

Your solution non works.

I have a memory error just after 3727 rows processed.

SolutionTRF1.jpg

 

 

 

 

 

 

 

 

 

 

SolutionTRF2.jpg

 

 

 

 

 

 

 

 

 

 

 

I think it's not a good way to try to turn around the problem.
I have RAM memory available, we have to understand how to use It.

Regards

Nine Stars TRF
Nine Stars

Re: tFileOutputExcel memory error

Hi,

It looks very strange as less than 4,000 rows is not a very small value.

Can you share a capture of the whole job?

Also, you can try by replacing tFileOutputExcel by tFileOutputDelimited, then when the job finish, convert the resulting csv file to xlsx (using tFileInputDelimited and tFileOutputExcel).


TRF
Four Stars

Re: tFileOutputExcel memory error

Hi TRF
It is obvious that if I use tFileOutputDelimited it works, do not you think so?
I have the problem if I check the flag "Add to existing file" in the component fFileOutputExcel.

If you want to try it is very simple.

You have to create a job with a data input ("Select * From ..." what you want) that it produces a recordset of approximately 200,000 rows.

If you prefer, you can use the tFileInputDelimited component in a file with the same number of rows.

In the fFileOutputExcel you have to check "Write excel2007" because only this format is able to handle a file with many lines.

SimpleJob.png

At the first run you have to NOT check "Add to existing file" and you can see that it works.

Then you must clear all rows except the first (header) with Excel or similarly.

Now you check "Add to existing file" and you run the job again.

I think that in this way you can generate the error.

Regards

Alessandro

Nine Stars TRF
Nine Stars

Re: tFileOutputExcel memory error

I've just tried with 250,000 lines and it works fine with an elapse time less than 1'.

The generated file is 21 Mb large.

My config is 8Gb RAM with standard JVM parameters (-Xms250M -Xmx1024M).

 

I've also tried to add 250,000 new lines after the 250,000 existing lines and this time it failed (yep!) with a Java heap space exception (so for a total of 500,000 lines).

Now, if I delete (not just clear) all the lines but the 1st oine and retry... it works again.

 

Are you sure the way you use to delete the existing content from the file during the test?

In case of, try to select all the lines, then right click and delete.

Also verify where is the last line (Ctrl+Fn+End). It should be the 1st line after deletion of the content and saving the file.

 

Else, I can't figure out why it doesn't works for you (and I'm very sorry to cannot solve your case), but continue to share your experiments.


TRF
Four Stars

Re: tFileOutputExcel memory error

Hi

I'm very sure the file haven't more than one row.

  1. Ctrl+Alt+End command stops on the last cell of the first line
  2. "Full" file is 30MiB large, "empty" file only 6KiB

Yet I still have the error:

 

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at org.apache.xmlbeans.impl.store.Saver$TextSaver.resize(Saver.java:1700)
	at org.apache.xmlbeans.impl.store.Saver$TextSaver.preEmit(Saver.java:1303)
	at org.apache.xmlbeans.impl.store.Saver$TextSaver.emit(Saver.java:1190)
	at org.apache.xmlbeans.impl.store.Saver$TextSaver.emitElement(Saver.java:962)
	at org.apache.xmlbeans.impl.store.Saver.processElement(Saver.java:476)
	at org.apache.xmlbeans.impl.store.Saver.process(Saver.java:307)
	at org.apache.xmlbeans.impl.store.Saver$TextSaver.saveToString(Saver.java:1864)
	at org.apache.xmlbeans.impl.store.Cursor._xmlText(Cursor.java:546)
	at org.apache.xmlbeans.impl.store.Cursor.xmlText(Cursor.java:2436)
	at org.apache.xmlbeans.impl.values.XmlObjectBase.xmlText(XmlObjectBase.java:1500)
	at org.apache.poi.xssf.model.SharedStringsTable.getKey(SharedStringsTable.java:134)
	at org.apache.poi.xssf.model.SharedStringsTable.addEntry(SharedStringsTable.java:180)
	at org.apache.poi.xssf.usermodel.XSSFCell.setCellValue(XSSFCell.java:362)

In the last test I have increased the swap memory to disk to have the same value as physical memory (4Gb+4Gb)... unfortunately still unsuccessful.

I'm pretty sure the problem is in the component.

If I do not select "Add to existing file" in the advanced option I can set the number of row to the buffer.

I haven't this possibility if I select "Add to existing file"; when it runs store in the memory all the data until the end of the record-set but this way it uses all the memory available as long as it ends.

Instead with not select "Add to existing file", it writes data every XXX rows and releases memory for new data; I can see this because the size of the file it increase on the disk during processing.
I not think there is a solution unless you modify the component or find a way to configure the memory usage.

Regards

Alessandro

Nine Stars TRF
Nine Stars

Re: tFileOutputExcel memory error

Well, I would like to be able to modify the component by myself, but I'm not from Talend...

Any chance to test with a PC with 8Gb RAM?


TRF