tRuleSurvivorship & tMatchGroup performance issue

One Star

tRuleSurvivorship & tMatchGroup performance issue

Hello,
As part of our ETL import we wanted to identify duplicates in the file. We are using tMatchgroup? and ?tRuleSurvivorship  to achieve this and were successful in identifying duplicates and create a new row for the survivor for each duplicate group.
While running this job on TAC, we are facing performance issue with these components. We ran a file with 2600 records and it was successful but sluggish(took 5 mins to process it). But when we run a file with 120K records, it just gets stuck on this subjob which has tMatchgroup? and ?tRuleSurvivorship  and doesn't process the data at all.
We cannot even set up parallelization on this sub job due to these components. After adding a level of logging we have identified that these components are the bottleneck. Can someone suggest how to improve the performance of these components.
We are using Talend Platform for Big Data 5.5.1.r118616, the jvm parameters for this job on TAC are set to (-Xms1024M, -Xmx24576M)
Any advice on performance improvement or way around this logic will be highly appreciated.
Thanks in advance.
Moderator

Re: tRuleSurvivorship & tMatchGroup performance issue

Hi npatel,
Could you please report a ticket on Talend Support Portal?
In this way, we can give you a remote assistance on your performance issue through support cycle with priority?
https://support.talend.com/otrs/customer.pl
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.