One Star

Tmatchgroup Limit?

Hello,
I'am trying to deduplicate 500 000 lines with tmatchgroup component, each times i ve an Exception in thread "main" java.lang.OutOfMemoryError. What's the limit for a tmatchgroup?
Thanks
3 REPLIES
Moderator

Re: Tmatchgroup Limit?

Hi,
For a large set of data, could you please try to store the data on disk instead of memory on tMatchgroup?
Here is a KB article about:TalendHelpCenter:Exception: outOfMemory
Best regards
Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Tmatchgroup Limit?

Hello,
I ve try this yesterday, now i havent errors but job is "freezing" without error message. After the first set of row processed nothing happens. You can see the screenshot that i ve uploaded. 
best regards
Employee

Re: Tmatchgroup Limit?

Hi,
Are you using a blocking key in the configuration of the component?
If you don't, you' retrying to do 500 000 x 500 000 comparisons. This won't fit in memory and even using the store-on-disk option, it will take days to complete...
You must use a blocking key (probably by generating it with the tGenKey component). Have a look at examples at https://help.talend.com/search/all?query=tMatchGroup&content-lang=en
The blocking key will partition the data so that the number of comparisons is greatly decreased.
See also this documentation https://help.talend.com/search/all?query=tGenKey&content-lang=en about how to tune your tGenKey configuration for a good performance. It's advised to build blocks (aka partitions) of a few tens or hundreds of line. Use the blocking key profile to tune your partitions.
Hope this helps.