How to filter only the first data row if email address is the same?

Highlighted
Six Stars

How to filter only the first data row if email address is the same?

Hi everyone,

 

I have an issue to filter a csv file.

 

Inside this file you'll find 4 columns and several hundreds of rows (see my small example).

Each contact is related to a customer ID (CustID).

ex1.png

Now the problem is if there are two or more  contacts related to the same custID and thoese contacts have the same email address like:

Jim Doe and Jack Example

I only need one of them (it doesn't matter which one) but the output should be like in this picture below.

ex2.png

 

How to filter it. Is there a way in tmap?

 

Or do I have to modify the sql source? Thanks for any hints.


Accepted Solutions
Employee

Re: How to filter only the first data row if email address is the same?

Hi,

 

    Please pass the data to a tUniqrow component where you can select email id as the key to get unique records.

 

    The details of component can be checked from below link.

 

https://help.talend.com/reader/9q55KsfASqX0qY4GVhEDNQ/1~mHnOdom1lPX7pBZZt56Q

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

View solution in original post


All Replies
Employee

Re: How to filter only the first data row if email address is the same?

Hi,

 

    Please pass the data to a tUniqrow component where you can select email id as the key to get unique records.

 

    The details of component can be checked from below link.

 

https://help.talend.com/reader/9q55KsfASqX0qY4GVhEDNQ/1~mHnOdom1lPX7pBZZt56Q

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

View solution in original post

Nine Stars

Re: How to filter only the first data row if email address is the same?

In your output there are two rows with John Doe and the same email, is that what you want?

 

I think tAggregateRow is better suited to your use case.  tAggregateRow would let you configure the rule on which row to choose if there is more than one for an email and to define functions on the other columns, e.g. choose the row with the greatest cust id and show the number of customers with that email in an additional column as well.

--
Please give Kudos and mark topics as solved where appropriate.

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog