Remove duplicate row based on all column match before inserting to DB

Highlighted
Five Stars

Remove duplicate row based on all column match before inserting to DB

I need to remove all duplicate records from a file before it get inserted to DB. In my case a record will be duplicate only in case all of the columns has same value. Example : The first two record will be considered as duplicate but third one is not. Here it is 5 columns , but could be more 10, 20.

 

ip             host_name       os_name           os_version

1.1.1.1        abc.com        Windows               8

1.1.1.1        abc.com        Windows               8

1.1.1.1        abc.com         Linux                   5.6

 

I need to insert records to Redshift DB after De-Dupe is done.

 


Accepted Solutions
Highlighted
Sixteen Stars
Sixteen Stars

Re: Remove duplicate row based on all column match before inserting to DB

Use tUniqRow and select all fields as keys

TRF

View solution in original post


All Replies
Highlighted
Sixteen Stars
Sixteen Stars

Re: Remove duplicate row based on all column match before inserting to DB

Use tUniqRow and select all fields as keys

TRF

View solution in original post

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Best Practices for Using Context Variables with Talend – Part 2

Part 2 of a series on Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 3

Read about some useful Context Variable ideas

Blog