Five Stars

Validate file for duplicate records

Hi,

 

I have a requirement to validate a file, if the file contains duplicate records, discard the file, if file does not contain duplicate process it. Even if we have one duplicate record discard the file.

 

Please help.

 

Thanks,

Pravin Sanadi

  • Data Integration
1 ACCEPTED SOLUTION

Accepted Solutions
Seven Stars TRF
Seven Stars

Re: Validate file for duplicate records

Hi,

I think the simple job is:

Capture.PNG

The "if" after tUniqRow is based on the value of the global variable tUniqRow_1_NB_DUPLICATES automagically associated to the tUniqRow_1 component (thank's to TDI):

((Integer)globalMap.get("tUniqRow_1_NB_DUPLICATES")) == 0

No tJavaRow required for this use case.

 

Hope this helps,


TRF
10 REPLIES
Seven Stars

Re: Validate file for duplicate records

 

I think a combination of tUniqRow and tJavaRow should do the trick.

You can connect the duplicate output from tUniqRow to tJavaRow.

Also you would require a context which can act as a flag.

Finally in tJavaRow, Set the context value as true if the input has some duplicates.

E.g

setDupFlag.PNG

and in tJavaRow,

tJavaRowDup.png

Note that in the above screen, the context - duplicateExists is of boolean datatype.

Once this subjob completes, you can use the context in a if condition to decide whether you want to process the file or not.

Four Stars

Re: Validate file for duplicate records

Hi,

you can use tfilelist and add all the files in a directory and add the directory in the tfilelist, and then pass it through a tfileinputdelimited and then tuniquerow. give a runif condition on tfileinputdelimited as u can get the data's of unique rows and then pass it through tjava giving condition on the noSmiley Surprisedf rows u have in a table, so if any duplicate is there in a table it will not show because we have rejected the record having duplicates using tuniquerow.So the table having less noSmiley Surprisedf rows reject it and the rest is your o/p.

Five Stars

Re: Validate file for duplicate records

Thanks, So to process the valid file I need to create another job?

Four Stars

Re: Validate file for duplicate records

u can do that in a single job...but u just need give an if condition in tfileinputdelimited 

Five Stars

Re: Validate file for duplicate records

job.JPG

 

In tjavarow i have written this code

context.ISVALID=true;
if(input_row.MATERIAL!=null)
context.ISVALID=false;

output_row.ISVALID = context.ISVALID;

 

I need to process if it is valid and discard if not. how to do it?

Four Stars

Re: Validate file for duplicate records

what are the keys u give in tuniquerow, there are no records passing from tuniquerow!!

Five Stars

Re: Validate file for duplicate records

so the scenario is if you don't have duplicates process the file forward, if tuniq gives you duplicate report it.

Seven Stars

Re: Validate file for duplicate records

from tJavaRow, you can use a runIf condition and join it with another subjob which will process the file.

ignoreDups.PNG

Here in if (order 1) you can mention just context.ISVALID as your condition which means process the file since it contains no duplicates. In If (order 2) you can mention !context.ISVALID which means the file has duplicates.

It's a simple solution but the problem is you will have to read the file again. If your input file size isn't too large, then this solution should work just fine. Otherwise the job run time will increase since we will be reading a huge file twice.

If I can think of something better I will post it here.

Seven Stars TRF
Seven Stars

Re: Validate file for duplicate records

Hi,

I think the simple job is:

Capture.PNG

The "if" after tUniqRow is based on the value of the global variable tUniqRow_1_NB_DUPLICATES automagically associated to the tUniqRow_1 component (thank's to TDI):

((Integer)globalMap.get("tUniqRow_1_NB_DUPLICATES")) == 0

No tJavaRow required for this use case.

 

Hope this helps,


TRF
Seven Stars

Re: Validate file for duplicate records

Yup. That's definitely a better and simpler solution.