Hello, As part of our ETL import we wanted to identify duplicates in the file. We are using tMatchgroup? and ?tRuleSurvivorship to achieve this and were successful in identifying duplicates and create a new row for the survivor for each duplicate group. While running this job on TAC, we are facing performance issue with these components. We ran a file with 2600 records and it was successful but sluggish(took 5 mins to process it). But when we run a file with 120K records, it just gets stuck on this subjob which has tMatchgroup? and ?tRuleSurvivorship and doesn't process the data at all. We cannot even set up parallelization on this sub job due to these components. After adding a level of logging we have identified that these components are the bottleneck. Can someone suggest how to improve the performance of these components. We are using Talend Platform for Big Data 5.5.1.r118616, the jvm parameters for this job on TAC are set to (-Xms1024M, -Xmx24576M) Any advice on performance improvement or way around this logic will be highly appreciated. Thanks in advance.