Six Stars

Remove duplicates with condition

Hi,

 

I have below scenario where I have to remove duplicates.  If my  ID column has duplicates and then look at the status column. If status column = Active or Inactive, then pick that row and remove other duplicate IDs for that row. I need to look my status column only when there r duplicates IDs. Example:

 

ID                           Status              Stage

000xnT2eYUTE     Active              To be billed

000xnT2eYUTE     Close               billed

000xnT2eYUTE     Close               billed

000xnT2eABCD    Inprogress        waived

 

So I want my output to be:

ID                           Status              Stage

000xnT2eYUTE     Active              To be billed

000xnT2eABCD    Inprogress        waived

 

How do I do it..?

1 ACCEPTED SOLUTION

Accepted Solutions
Nine Stars

Re: Remove duplicates with condition

Hi KanT,

 

Please try below.

Duplicates1.PNGDuplicates2.PNGDuplicates3.PNGDuplicates4.PNG

Regards,

Veeru Boppudi
10 REPLIES
Twelve Stars

Re: Remove duplicates with condition

@KanT,find below.

Untitled.pngUntitled.pngUntitled.pngUntitled.png

Manohar B
Twelve Stars

Re: Remove duplicates with condition

@KanT,you can use tMap instead of tFilterRow and tFiltercoulumns.

Manohar B
Six Stars

Re: Remove duplicates with condition

Hi Manodwhb,

 

Thanks for your reply. These screenshots are v helpful. I have few questions though:

 

1. Instead of tfixed input, I am using Tinputdelimited. I guess it shouldn't impact, but just want to crosscheck with you.

2. If status column = Active or Inactive, then pick that row and remove other duplicate IDs for that row.

 

Thanks again!

Nine Stars

Re: Remove duplicates with condition

Hi KanT,

 

So as per my understanding, You need to identify duplicates based on ID. and If you have any duplicates in ID then consider only only one record based on Status. Priority of status will be like Active, Inactive, etc... correct me if i am wrong.

 

000xnT2eYUTE     Active              To be billed

000xnT2eYUTE     Close               billed

000xnT2eYUTE     Close               billed

000xnT2eY12E     InActive            billed

000xnT2eABCD    Inprogress        waived

000xnT2eABCD    Close               billed

000xnT2eABCD    Active              To be billed

000xnT2e123D    Active              To be billed

 

Output will be:

000xnT2eYUTE     Active              To be billed

 

000xnT2eY12E     InActive            billed

000xnT2eABCD    Active              To be billed

000xnT2e123D    Active              To be billed

 

Is this expected output or any changes?

Regards,

Veeru Boppudi
Six Stars

Re: Remove duplicates with condition

Hi vboppudi,

 

Your understanding is correct. If there are duplicates, I want to look at my Status column. if Status column = Active or Inactive, then pick that row only and ignore rest duplicates. I will just modify your example a bit:

 

000xnT2eYUTE     Close              To be billed

000xnT2eYUTE     Active              billed

000xnT2eYUTE     Close               billed

000xnT2eY12E     Close               billed

000xnT2eABCD    Inprogress        waived

000xnT2eABCD    Close               billed

000xnT2eABCD    Active              To be billed

000xnT2e123D    Active              To be billed

 

Output will be:

000xnT2eYUTE     Active              billed

 000xnT2eY12E     Close            billed (ID here is not duplicate, so it will ignore status field in this case)

000xnT2eABCD    Active              To be billed

000xnT2e123D    Active              To be billed

 

How do I use this logic in taggregaterow? I did as per the screenshot mentioned in the post above but my results are not write.

 

 

Nine Stars

Re: Remove duplicates with condition

Hi KanT,

 

Please try below.

Duplicates1.PNGDuplicates2.PNGDuplicates3.PNGDuplicates4.PNG

Regards,

Veeru Boppudi
Six Stars

Re: Remove duplicates with condition

Hi Vboppudi,

 

This works great! Thanks a  lot!

One small question, I have 10 more columns in my table. Now my output file populates all the other columns as blank. Where do I make the change so that all additional columns are populated.

 

Ex.

ID                           Status             Stage                                  Name                  Resource

000xnT2eYUTE     Active              To be billed                        ABC                     Htytm

000xnT2eYUTE     Close               billed                                  ABC                     Htytm

000xnT2eYUTE     Close               billed                                  ABC                     Htytm

000xnT2eY12E     InActive            billed                                  DRF                      uuyuu

000xnT2eABCD    Inprogress        waived                               GGG                     yuyuy

000xnT2eABCD    Close               billed                                   GGG                     yuyuy

000xnT2eABCD    Active              To be billed                           GGG                     yuyuy

000xnT2e123D    Active              To be billed                            AAA                     ggsgs

 

Output will be:

000xnT2eYUTE     Active              To be billed                   ABC                     Htytm

 000xnT2eY12E     InActive            billed                             DRF                      uuyuu

000xnT2eABCD    Active              To be billed                    GGG                     yuyuy

000xnT2e123D    Active              To be billed                  AAA                     ggsgs

 

should I add these all these columns In my 'taggregate row' operations..?

Nine Stars

Re: Remove duplicates with condition

You have to add required columns from source to all components. in Aggregate select first for all columns.

Regards,

Veeru Boppudi
Twelve Stars

Re: Remove duplicates with condition

In tAggregaterow,you need need gorup by all input colonnade.
Manohar B
Five Stars

Re: Remove duplicates with condition

Hi Monar,

 

Very intersting example

Do you know how to detect deplicate with a weight?

ie: if to detect that "Françoiss" is the same record as "francois"

 

Thank you