I need to figure a way to split a dataset into two datasets, based on the data repetition in one of the fields. If a value is getting repeated, I need to put all the rows in one dataset and if if it's unique, that row should go to another.
e.g. if I have the dataset with following records:
For the above dataset, the repeating records need to go to dataset 1 so that the dataset 1 contains
and non-repeating values need to go to another dataset so that the dataset 2 contains
If I use tUniqueRow, A1 would still go to the resultant dataset. I could make the unique dataset first, then do some sort of comparison with the remaining records and them add the row in the other dataset, remove the row from unique dataset. This is messy. Can anyone make a recommendation?
I tried but looks like I am stuck. With the tUniqRow, I can split my dataset into unique and repeating bits. However, the tMap component won't let me compare them both. For some unknown reason, I can't seem to add both them to the tMap mode. How do I do that? Unless I am able to draw a comparison between the two splits, I won't be able to move around the data so that all repeating records are separated in one segment, and non repeating in another. @TRF
I am sure I got lost somewhere in the middle. Let me do some reading, before attempting this solution.
Watch the recorded webinar!
Accelerate your data lake projects with an agile approach
Create systems and workflow to manage clean data ingestion and data transformation.
Introduction to Talend Open Studio for Data Integration.