howa to optimise tUniqRow and tSortRow

Four Stars

howa to optimise tUniqRow and tSortRow

Is it better to put tSortRow before tUniqRow or vice versa for the best perfermence? Or to use tAgrregateSortedRow instead of tUniqRow? if not how to optimize tUniqRow? knowing that I use "disk option", and how the work crash. I am working on a file of 3 million lines


Accepted Solutions
Highlighted
Moderator

Re: howa to optimise tUniqRow and tSortRow

Usually you would want to put a tSortRow before a tUnique row. It makes more sense logically to sort your data before finding the unique values, especially if you are using tAggregateSortedRow. In terms of using tAggregateSortedRow or tUniqRow, it would depend on what you are trying to do. tAggregateSortedRow is meant to find metrics based on values and calculations, while tUniqRow is more for finding duplicates. 


All Replies
Moderator

Re: howa to optimise tUniqRow and tSortRow

Hello,

Performance issue is usually caused by the DB connection or the job design, could you please upload some screenshots of your current job design?

Best regards

Sabrina

 

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Highlighted
Moderator

Re: howa to optimise tUniqRow and tSortRow

Usually you would want to put a tSortRow before a tUnique row. It makes more sense logically to sort your data before finding the unique values, especially if you are using tAggregateSortedRow. In terms of using tAggregateSortedRow or tUniqRow, it would depend on what you are trying to do. tAggregateSortedRow is meant to find metrics based on values and calculations, while tUniqRow is more for finding duplicates. 

Four Stars

Re: howa to optimise tUniqRow and tSortRow

 

below the job schema ..... i used sorting on disk in tSortRow and the problem comes from tUniqRow

                              tOracleInput
                                   |
                                   |
tFileInputPosionnel ---- tMap
                                    |
                                    |
                                 tUnit -------------------------->tSortRow------------>tUniqRow------------------>tFileOutputPosinnel
                                   |
                                   |
tOracleInput--------- tMap

Moderator

Re: howa to optimise tUniqRow and tSortRow

If you are running into a Java Heap Space issue you could increase the JVM so you can process more records or you could also use the "Use of Disk" setting on the tUniqRow advanced settings as well. This will process the data with files, making it use less system memory to process data. 

What’s New for Talend Spring ’19

Join us live for a sneak peek!

Sign up now

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.

Download

Tutorial

Introduction to Talend Open Studio for Data Integration.

Watch

Downloads and Trials

Test drive Talend's enterprise products.

Downloads