unique values extraction

Four Stars

unique values extraction

Hi All,

need a help regarding extraction of unique values of a multiple columns individually in a table by using Talend/SQL.

EXAMPLE:

INPUT:

IDNAMESALARYAGE
1MOHAN500027
2AJAY270026
3SAI500029
4RAMA300027
5MOHAN450029

 

EXPECTED OUTPUT:

IDNAMESALARYAGE
1MOHAN500027
2AJAY270026
3SAI300029
4RAMA4500 
5   

 

Thanks in advance.

regards;

Bhargav

Employee

Re: unique values extraction

Hi,

 

    You can either use tAggregaterow or tUniquerow to select the unique values where you need to pass only relevant fields as input to these components (you can use a tMap to select the right columns).

 

    One thing I would like to say is that your input and output record do not have any correlation. For example, you were having two input records as shown below.

 

MOHAN	5000	27
MOHAN   4500    29

     But you have taken the unique value as 

 

MOHAN500027

 

     You might be joining records of two persons with name Mohan with different age to a single output record. Since the output record details are a combination of the earlier two records it may result in a totally new imaginary person.

 

     So please double check how you would like to aggregate the input records.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved

Highlighted
Eight Stars

Re: unique values extraction

In addition to all the valid inferences from Nikhil, another approach i would suggest is:

1) Sort your data based on the column you want to check on (In your example Name)

2) Check the subsequent rows of data if they match the column value, if they match then ignore else carry forward. With this approach you would only process the first row of data for that column. However you are have to double ensure that your sort criteria is correct and valid

Eg.   Input Data

       SNo.     Name      Age

      100       Mohan       23

      101       Mike          45

      102       Mohan       32

 

Sorted Data

      SNo.     Name      Age

      100       Mohan       23

      102       Mohan       32

      101       Mike          45

Process Detail

when the current row is Sno=100 you have nothing to compare so that will go through, when the current row is 102 you find that for the previous record Name= the name of the current record, so you can ignore that record, when you get to the next record Name=Mike which is different from Mohan so that will go through

 

This approach would greatly help in performance as well, because if the incoming data volume is huge, aggregator will quickly turn out to be performance hindrance. This is a age old technique where you compare rows on the fly... Hope that helps

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Best Practices for Using Context Variables with Talend – Part 2

Part 2 of a series on Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog