One Star

[resolved] tSchemaComplianceCheck with validation problem

Hello,
I am quite new to Talend.
I did read how to use validation rule but could not get it to work.
I did try to extract the validation I want to do in a simple example.
There are two rows in my csv containing different BICBANK.
I did a validation rule on the file delimited schema but the validation is always successful?
I try to use this validation on a tSchemaComplianceCheck but not row is rejected.
If I change the if statement in the condition to
input_row.BANKBIC.equalsIgnoreCase("ABSBDE66XXX")
Then it does not change anything all rows are validated.
I probably does something wrong but I cannot figure out what.
Is anyone having a clue?
Thanks,
Francois
1 ACCEPTED SOLUTION

Accepted Solutions
One Star

Re: [resolved] tSchemaComplianceCheck with validation problem

Francois,
I worked on a project some years ago which is identical to what you're trying to do. In my case, some tables had over 100 fields and we needed to perform checks for specific fields and either warn or fail the row. We tried to a number options and finally very happily settled with using the tJavaRow component. It allowed us to be completely flexible and allowed us to code all the conditions very easily. When I saw your pseudo code, I thought you might want to try this out...
So what I have on the attached screenshots is this:
First screen shot shows you the overall job...
Second screenshot shows you the details of the tJavaRow; You will see that we even got fancy and defined whether a violation was an error or a warning; As you can see from the Java code, we take every input field and we perform checks; then we write the warning or error text into a specific output field. This is where you'd put your pseudo code very easily and cleanly..
The 3rd screenshot shows the bottom of the tJavaRow, where we pass the concatenated errors into an output field.
The 4th screenshot shows you the schema of the tJavaRow - you will notice we have available a lot of fields coming in for validation, but only a few going out. You can change it out you see fit like this: you can have all the fields go out (as they came in), but add the error fields like my job, and the use a tMap downstream to filter out rows that have a particular error code...
In my case, we only caught exceptions out of the process. So in the last screenshot, we use a filter to only allow through rows that have errors... In your case, depending on your design, you could use a tMap to filter out good from bad rows...
When you run this, the last thing you'd have to do is denormalize the list of errors that are concatenated together into columns or rows and feed into some log table. We did that and the BI system pulled those errors and presented in validation reports after the Talend Job finished...
Hope this helps...
7 REPLIES
Moderator

Re: [resolved] tSchemaComplianceCheck with validation problem

Hi,
What your input data looks like? Could you please elaborate your case with an example with input and expected output values?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: [resolved] tSchemaComplianceCheck with validation problem

In Fact I have two rows in my csv:
"0342972103.0001","Nickname Doe53"," John Doe","Koln","","Strasse","DE89370999440552013999","Richterstrasse 33","DE","VOLKSBANK PFORZHEIM EG","53, WESTLICHE KARL-FRIEDRICH-STRASSE","PFORZHEIM","DE","","VBPFDE66XXX","SEPA"
"0342972103.0001","Nickname frs523","John Doe","Koln","","Strasse","DE89370999440552013999","Richterstrasse 33","DE","VOLKSBANK PFORZHEIM EG","53, WESTLICHE KARL-FRIEDRICH-STRASSE","PFORZHEIM","DE","","ABSBDE66XXX","SEPA"\
The second row contains ?ABSBDE66XXX? at the second last place which is a not valid value. This row should, after, be reported has invalid.
In fact I need to validate some values in other filed, like value in a list, etc.
If this work I will be able to make more complex validation working.
If you need I can export my job.
Again thanks for your help,
Best Regards,
Francois
Moderator

Re: [resolved] tSchemaComplianceCheck with validation problem

Hi,
The statement (input_row.BANKBIC.equalsIgnoreCase("ABSBDE66XXX")) is correct. Maybe something wrong with your input row separator which make the "filter" no sense.
See my screenshots for details
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: [resolved] tSchemaComplianceCheck with validation problem

Thanks a lot I did think but our real life application validation is more complex than check against one value.
To be more precise something like that:
IF(FIELDA == null) {
if(FIELDB != "VALUE" ) invalid reason 1;
if(FIELDD.length() > 8) invalid reason2;
} else if (FIELDE != null) {
if(FIELDF ==null) invalid reason 3;
} else {
if(FIELDD.length()>11) invalid reason 4
}
I was ready to chain the validators but how?
I try to avoid to write java routine to make it work. I tried to find a Talend way to do it.
Any tips is really appreciated.
Best Regards,
Francois
One Star

Re: [resolved] tSchemaComplianceCheck with validation problem

Francois,
I worked on a project some years ago which is identical to what you're trying to do. In my case, some tables had over 100 fields and we needed to perform checks for specific fields and either warn or fail the row. We tried to a number options and finally very happily settled with using the tJavaRow component. It allowed us to be completely flexible and allowed us to code all the conditions very easily. When I saw your pseudo code, I thought you might want to try this out...
So what I have on the attached screenshots is this:
First screen shot shows you the overall job...
Second screenshot shows you the details of the tJavaRow; You will see that we even got fancy and defined whether a violation was an error or a warning; As you can see from the Java code, we take every input field and we perform checks; then we write the warning or error text into a specific output field. This is where you'd put your pseudo code very easily and cleanly..
The 3rd screenshot shows the bottom of the tJavaRow, where we pass the concatenated errors into an output field.
The 4th screenshot shows you the schema of the tJavaRow - you will notice we have available a lot of fields coming in for validation, but only a few going out. You can change it out you see fit like this: you can have all the fields go out (as they came in), but add the error fields like my job, and the use a tMap downstream to filter out rows that have a particular error code...
In my case, we only caught exceptions out of the process. So in the last screenshot, we use a filter to only allow through rows that have errors... In your case, depending on your design, you could use a tMap to filter out good from bad rows...
When you run this, the last thing you'd have to do is denormalize the list of errors that are concatenated together into columns or rows and feed into some log table. We did that and the BI system pulled those errors and presented in validation reports after the Talend Job finished...
Hope this helps...
One Star

Re: [resolved] tSchemaComplianceCheck with validation problem

Thanks willm, it is direction I wanted to take, your example helps a lot.
I do not have a lot of experience with Talend (quite some in Java) and I was wondering if I was using the adviced way we should do it with Talend.
Best Regards,
Francois
One Star

Re: [resolved] tSchemaComplianceCheck with validation problem

Glad it helped...