Hi Guys I am actually trying to create a Taland map to remove duplicates from an attendance system. I have been able to identify the duplicates in the attendance system but I need to pick the first duplicate record in case of an entry to the office and the last duplicate record in case of exit from the office. I have been able to sort and get the data in the format as displayed below.
I need to pick the 2nd row as that is an entry and I need to pick the 5th Row as that is the last exit for that employee for that day this is for employee AA for 08-01-2018
For the Date 08-02-2018 for AA I need to have all the records as there are no duplicate Entry or Exit Entries.
For Date 08-01-2018 for BB I need to take the 10th and the 12th record
So to Simplify it I have also attached the desired out put.
Could you please provide the sample file as a csv attachment?
1) taggregaterow by grouping on IO_Date_Only & Emp_name and output operation of IO_Status,IO_time, flag with function as First and other taggregaterow with function as last.
2) Using tunite component for first & last record and then tsortrow to sort in ascending order.
Hope this solves the purpose..!!
Sorry for the confusion. My mistake. I should have shared a better example. I am reattaching a fresh file to better explain the scenario.
For records dated 08-02-2018 I should get all the records in the output because there is no consecutive entries or exits.
Try Talend Cloud free for 30 days.
Introduction to Talend Open Studio for Data Integration.
Practical steps to developing your data integration strategy.
Create systems and workflow to manage clean data ingestion and data transformation.