I have CSVs and there are lots of data in CSVs. Basically when I ran job for the specific client I need not to insert the duplicate entry which have same billing number.
This is IgnoreRecord in which I need a condition that same billing number enter not inserted in the table.
The sample data is like this:
Customer,,Calendar day,Billing document,Reference document,Customer PO, Net Qty," Avg
199,Test,2018-02-19,2117199105,532021306,120493,79,$45.00 ,"$3,555.00 "
199,Test,2018-02-19,2117199105,532021306,120493,32,$45.00 ,"$1,440.00 "
The bold one is the billing number.
So basically I need only 2 entries in the table when I ran the code. No need to insert same billing number entry.
Use a tAggregateRow to achieve this. Connect it to your input component and group by your billing number column. Output ALL of the other columns in your "Operations" table and set the Function for each column to First or Last. The Function allows you to specify whether you want the first record in the group's values or the last record in the group's values to be used. There are other functions, but I think you will probably only need to look at those.
The tUniqRow would work, but it doesn't give you the sort of control over the values to keep that the tAggregateRow does. I suggested it with the fact in mind that the "duplicate rows" are not truly duplicate and therefore I'd expect preferred values from the two or more rows to be required.
Join us live for a sneak peek!
Create systems and workflow to manage clean data ingestion and data transformation.
Introduction to Talend Open Studio for Data Integration.
Test drive Talend's enterprise products.