Need recommendation on best way to split dataset into two - one with repeating records and another with non repeating records

Six Stars

Need recommendation on best way to split dataset into two - one with repeating records and another with non repeating records

I need to figure a way to split a dataset into two datasets, based on the data repetition in one of the fields. If a value is getting repeated, I need to put all the rows in one dataset and if if it's unique, that row should go to another.

 

e.g. if I have the dataset with following records:

A1

A1

A2

A3

 

For the above dataset, the repeating records need to go to dataset 1 so that the dataset 1 contains

A1

A1

 

and non-repeating values need to go to another dataset so that the dataset 2 contains

A2

A3

 

If I use tUniqueRow, A1 would still go to the resultant dataset. I could make the unique dataset first, then do some sort of comparison with the remaining records and them add the row in the other dataset, remove the row from unique dataset. This is messy. Can anyone make a recommendation?

Forteen Stars TRF
Forteen Stars

Re: Need recommendation on best way to split dataset into two - one with repeating records and another with non repeating records

You got it, tUniqRow + tMap to remove rows which exist as duplicates.

TRF
Six Stars

Re: Need recommendation on best way to split dataset into two - one with repeating records and another with non repeating records

I tried but looks like I am stuck. With the tUniqRow, I can split my dataset into unique and repeating bits. However, the tMap component won't let me compare them both. For some unknown reason, I can't seem to add both them to the tMap mode. How do I do that? Unless I am able to draw a comparison between the two splits, I won't be able to move around the data so that all repeating records are separated in one segment, and non repeating in another. @TRF

Forteen Stars TRF
Forteen Stars

Re: Need recommendation on best way to split dataset into two - one with repeating records and another with non repeating records

After tUniqRow catch uniq rows into a tHashOutput (here called uniq) and catch replicated rows into an other tHashOutput (here called replicate).
On the next subjob, use a tMap with "uniq" tHashInput for the main input and "replicate" for the lookup.
Join both using an inner join and add 2 outputs for the tMap, 1 catches the rejected inner join rows (really uniq) and the other don't catch them.
Redirect 1rst output to what you want and 2nd output to a tHashOutput linked to "replicate" (append mode is required).
This new tHashOutput should contain the subset of replicated records.
Hope you got it.

TRF
Six Stars

Re: Need recommendation on best way to split dataset into two - one with repeating records and another with non repeating records

I am sure I got lost somewhere in the middle. Let me do some reading, before attempting this solution.