I have CSVs and there are lots of data in CSVs. Basically when I ran job for the specific client I need not to insert the duplicate entry which have same billing number.
This is IgnoreRecord in which I need a condition that same billing number enter not inserted in the table.
The sample data is like this:
Customer,,Calendar day,Billing document,Reference document,Customer PO, Net Qty," Avg
199,Test,2018-02-19,2117199105,532021306,120493,79,$45.00 ,"$3,555.00 "
199,Test,2018-02-19,2117199105,532021306,120493,32,$45.00 ,"$1,440.00 "
The bold one is the billing number.
So basically I need only 2 entries in the table when I ran the code. No need to insert same billing number entry.
Use a tAggregateRow to achieve this. Connect it to your input component and group by your billing number column. Output ALL of the other columns in your "Operations" table and set the Function for each column to First or Last. The Function allows you to specify whether you want the first record in the group's values or the last record in the group's values to be used. There are other functions, but I think you will probably only need to look at those.
The tUniqRow would work, but it doesn't give you the sort of control over the values to keep that the tAggregateRow does. I suggested it with the fact in mind that the "duplicate rows" are not truly duplicate and therefore I'd expect preferred values from the two or more rows to be required.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Pick up some tips and tricks with Context Variables
Learn how media organizations have achieved success with Data Integration
Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend