One Star

OutOfmemory with big Lookup-Table

I work with a 6 million row table (Oracle), where i need to update only parts of a column (parts of string will be replaced).
Using the table as a lookup table input (tMap) the job quits after roughly 1 Mio read entries (the first step, which is executed)
Does anybody has a workaround for that ?
cheers, Benjamin
(working on a 2GB windows XP-machine)
Exception in thread "Thread-0" java.lang.OutOfMemoryError: Java heap space
at java.nio.CharBuffer.wrap(Unknown Source)
at sun.nio.cs.StreamEncoder.implWrite(Unknown Source)
at sun.nio.cs.StreamEncoder.write(Unknown Source)
at java.io.OutputStreamWriter.write(Unknown Source)
at java.io.BufferedWriter.flushBuffer(Unknown Source)
at java.io.BufferedWriter.flush(Unknown Source)
at java.io.PrintWriter.newLine(Unknown Source)
at java.io.PrintWriter.println(Unknown Source)
at java.io.PrintWriter.println(Unknown Source)
at routines.system.RunStat.sendMessages(RunStat.java:131)
at routines.system.RunStat.run(RunStat.java:104)
at java.lang.Thread.run(Unknown Source)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Unknown Source)
at java.lang.String.<init>(Unknown Source)
at ......
20 REPLIES
One Star

Re: OutOfmemory with big Lookup-Table

Versions prior to the latest (2.4) read the lookups into memory. So, what I tried to do was to limit the number of columns retrieved from the lookup tables to just what I needed. Just used a select statement in the tXXXInput to select only the columns I was going to use for the lookup. It usually worked - but, with large tables, even then it would crash.
Version 2.4 has an option on the tMap to use a disk file to store the lookups. So, I would say that a combination of both methods should solve your problem efficiently.
One Star

Re: OutOfmemory with big Lookup-Table

Hi SMaz
Thank you for your suggestions. I am happy to hear that 2.4 has a "file" feature in tMap. My company actually has started to work with JasperETL 2.3.x - So I hope to soon be able to use 2.4.
While browsing I found a lot of members having trouble with big lookup tables. Although the ZIP-Code Lookup Examples always work well...
@Talend-Team
My suggestion: Why not having *one* select statement to txxxInput Component on *each* main data row, which requires a join. let Oracle or other db's do the caching or whatever appropriate. Why building again hash-tables and files ?
Would that be a difficult code-change to a tJoin or txxxInput (lookup) component ?
Any discussion or comments are welcom!
cheers, Benjamin
One Star

Re: OutOfmemory with big Lookup-Table

@ Talend- Team
Could you please explain what is meant with: "New high performance lookup mode supporting multi-gigabyte tables and files " as stated in recent newsletter apraising 2.4?
Is it the "swap to file" option in tOracleInput ? or anything else really high performing ?
cheers
Benjamin
Employee

Re: OutOfmemory with big Lookup-Table

My suggestion: Why not having *one* select statement to txxxInput Component on *each* main data row, which requires a join. let Oracle or other db's do the caching or whatever appropriate. Why building again hash-tables and files ?
Would that be a difficult code-change to a tJoin or txxxInput (lookup) component ?

We are agree with you, it can be an excellent behavior in a Database context.
We will work about this subject for the 2.5 release.
Yet, when big files are used without available database, user can't use a cache db to process its data, so the tMap's "Store on disk" option will remain useful in this case.
Could you please explain what is meant with: "New high performance lookup mode supporting multi-gigabyte tables and files " as stated in recent newsletter apraising 2.4?
Is it the "swap to file" option in tOracleInput ? or anything else really high performing ?

This information speak about the the tMap's "Store on disk" option. "high performance lookup" should be relativized, usually it will be slower than memory management, yet any benchmarks with specific parameters will be able to have same performance with and without "Store on disk" enabled.
I will post a screenshot of a benchmark set with default parameters and a big number of rows soon.
One Star

Re: OutOfmemory with big Lookup-Table

Hi amaumont
Thank yo for the reply on my questions! I think the Talend Team is on the right track!
Actually we only have some troubles to get a grip on your sales staff.
Cheers
Benjamin
One Star

Re: OutOfmemory with big Lookup-Table

Hi
I installed Release 2.4.0 and tried the new "Store to disk" option.
(btw: this option is in the mapping editor as a table option of the lookup table)
I got a compilation error when using the Lookup Table Row in a Var Expression. the "rowX could not be found"
it runs, but only a bit longer (then "memory only") if I just pass the row from lookup to the output
The Row Chunk I use is 1'000'000 which is closed to memory overflow row count. Normal Overflow occurs at 1.4Mio retrieved lookup rows. with file save option it stops at 2.4 Mio rows
btw: the temp files are not cleaned up. sums up quickly to Gig's
One Star

Re: OutOfmemory with big Lookup-Table

Continued test:
Row Chunk reduced to 500'000 worked , but later, in the middle of treating rows: again Memory problems, I guess there might be some bigger leaks or just inefficiency?
Employee

Re: OutOfmemory with big Lookup-Table

The problem come from a bug in the join algorithm, it read a (very big) bad value for a length of an array and tries to allocate it.
This problem does not depend on the amount of data but a specific sequence of data to join. A small number of data can implies this problem.
It remains us to find which it...
One Star

Re: OutOfmemory with big Lookup-Table

Hi amaumont
might it be helpfull, if I send you the test data ? it is a bunch of csv as input, as well a big oracle table, which I can send you as a csv file. Do you have any ftp for such purpose ?
Cheers, Benjamin
One Star

Re: OutOfmemory with big Lookup-Table

Hi amamount
Did you see my remark on a possible bug in 2.4:
"I got a compilation error when using the Lookup Table Row in a Var Expression. the "rowX could not be found""
Benjamin
Employee

Re: OutOfmemory with big Lookup-Table

Indeed, I forget this remark "rowX could not be found".
For now temp files are not deleted if a fatal error occurs.
About data test, I think I found cause of errors, yet it may remain other cases.
You can follow the 4271, it is very poor commented but I can't detail it more for now.
Employee

Re: OutOfmemory with big Lookup-Table

Could you be more precise about your remark "rowX could not be found".
Can you create a Bugtrack with a simple example job ?
Maybe problem is already fixed, but i would ensure it.
Tank you Benjamin.
One Star

Re: OutOfmemory with big Lookup-Table

Hi amamount
Great! I saw you could fix the memory bug, as well I appreciated the Performance Study! Thank You
let me just come back to the problem above:
I give you a few pictures to explain it. in my case row3 gives my the java error.
cheers
Benjamin
Employee

Re: OutOfmemory with big Lookup-Table

Thank you very much for all these details.
I reproduced your problem on 2.4.0 easily, then I imported the built job into a current dev version, but in this case no compilation error occurs.
You can look at the 4474 that I resolved.
One Star

Re: OutOfmemory with big Lookup-Table

Hi
TIS 2.4.1 produces:
Exception in component tMap_5_TMAP_IN
java.lang.RuntimeException: java.io.FileNotFoundException: /home/dob/WM_test/FSK-BHF/temp/WM_STAGE1_DISTRIBUTE_tMapData_row9_TEMP_792.bin (Too many open files)
with roughly 1300 files in the temp area with big Lookup Tables 10'000'000 rows and more
cheers
Benjamin
One Star

Re: OutOfmemory with big Lookup-Table

Hi
I would like to add some observation to the above:
Java heap was set to 1024 and I increased buffer size from 100'000 to 1'000'000 to get bigger files.
Now the jobs fails with "Exception in thread "Thread-0" java.lang.OutOfMemoryError: Java heap space"
Now I set the -Xmx2048M in the start.script trying to avoid the heap space error. Result is still pending......
This forum entry is to show, that the Lookup Concept from tMap urgently needs an extension to the exsiiting concept, as I mentioned already in one of the above contributions.
For Each Row in Input stream Lookup the record in DB, fetch it and allow some operation (java-expression) to join some data.
Is there any other component-combination to allow the above scenario, without tMap ?
I am grateful for any short term workaround.
Cheers
Benjamin
Employee

Re: OutOfmemory with big Lookup-Table

Indeed, to prevent the "Too many open files" error you have to increase the "Max buffer size".
About the "Thread-0" java.lang.OutOfMemoryError: Java heap space" error, it may come from:
- any problematics cases such as the 4867 which describes a problem with the FIRST MATCH mode
- other components which need much memory, such as the aggregate or database components which may accumulate a big amount of data
The best way is to use a low "Max buffer size" which allow to use a minimal amount of memory and which generate a number of generated opened files allowed by the system.
It will be interesting for me to have more details on your "Java heap space error".
One Star

Re: OutOfmemory with big Lookup-Table

Hi
Thank you for your reply.
My job was running on a Linux machine, I was not aware, that my file system limits the openfile number to something around 1300.
This is interesting information.
btw. even 2048 MB RAM run into the heap space error. I think 1 mio records in buffer is too much for our tMap component.
btw: I ran into another strange beahviour. tDenormalizeSortedRow seems not to treat the last row. My input has 20 rows (with tSampleRow) and then some Normalisation, selecting and denormalisation leads to only 19 rows left. the last one disappeared somewhere.
Cheers
Benjamin
Employee

Re: OutOfmemory with big Lookup-Table

I would like see the complete error message to see if this an algorithm bug or an other problem.
It could be useful to give a screenshot of your job if possible.
About the problem on the tDenormalizeSortedRow, please create a new topic.
Thank you Benjamin.
One Star

Re: OutOfmemory with big Lookup-Table

Hi
Thank you for the quick support. I appreciate.
the screen shot doesn't say much, but anyway i will be able to work on this next time 8.Sept.
I hope this is ok for you
Cheers
Benjamin