howa to optimise tUniqRow and tSortRow

Four Stars

howa to optimise tUniqRow and tSortRow

Is it better to put tSortRow before tUniqRow or vice versa for the best perfermence? Or to use tAgrregateSortedRow instead of tUniqRow? if not how to optimize tUniqRow? knowing that I use "disk option", and how the work crash. I am working on a file of 3 million lines


Accepted Solutions
Highlighted
Moderator

Re: howa to optimise tUniqRow and tSortRow

Usually you would want to put a tSortRow before a tUnique row. It makes more sense logically to sort your data before finding the unique values, especially if you are using tAggregateSortedRow. In terms of using tAggregateSortedRow or tUniqRow, it would depend on what you are trying to do. tAggregateSortedRow is meant to find metrics based on values and calculations, while tUniqRow is more for finding duplicates. 


All Replies
Moderator

Re: howa to optimise tUniqRow and tSortRow

Hello,

Performance issue is usually caused by the DB connection or the job design, could you please upload some screenshots of your current job design?

Best regards

Sabrina

 

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Highlighted
Moderator

Re: howa to optimise tUniqRow and tSortRow

Usually you would want to put a tSortRow before a tUnique row. It makes more sense logically to sort your data before finding the unique values, especially if you are using tAggregateSortedRow. In terms of using tAggregateSortedRow or tUniqRow, it would depend on what you are trying to do. tAggregateSortedRow is meant to find metrics based on values and calculations, while tUniqRow is more for finding duplicates. 

Four Stars

Re: howa to optimise tUniqRow and tSortRow

 

below the job schema ..... i used sorting on disk in tSortRow and the problem comes from tUniqRow

                              tOracleInput
                                   |
                                   |
tFileInputPosionnel ---- tMap
                                    |
                                    |
                                 tUnit -------------------------->tSortRow------------>tUniqRow------------------>tFileOutputPosinnel
                                   |
                                   |
tOracleInput--------- tMap

Moderator

Re: howa to optimise tUniqRow and tSortRow

If you are running into a Java Heap Space issue you could increase the JVM so you can process more records or you could also use the "Use of Disk" setting on the tUniqRow advanced settings as well. This will process the data with files, making it use less system memory to process data. 

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

6 Ways to Start Utilizing Machine Learning with Amazon We Services and Talend

Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend

Blog