Data validation against dynamic schema

Four Stars

Data validation against dynamic schema

Hi,

I need to validate data which come from csv files against dynamic schema and I am using Talend 5.6 Enterprise version. I read a number of posts on Talend Community and found that tComplianceCheck cannot be used in this case as it does not support dynamic schema. Can you please advise how this can be done in Talend ?

 

My initial approach is to create 2 separate jobs: e.g. Job1 - create schema dynamically from a schema definition file (csv), then pass this schema definition to Job2, which will loop through a number of csv files, read each one and compare/validate each record against the schema defined in Job1. My question is how I can pass a dynamic schema to Job2 to do the validation ? Is it a feasible solution ?

 

The problem is the schema definitions are not known in advance so I cannot create it when designing the job. Users of my application would define the schema using a schema definition file (could be csv/excel) and provide the csv files to be validated against that particular schema at runtime.

 

Look forward to your suggestions/advice on this topic.

 

Thanks !

 

 

Moderator

Re: Data validation against dynamic schema

Hello,

Could you please elaborate your case with an example with input and expected output values?

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Four Stars

Re: Data validation against dynamic schema

Thanks for your reply. I need to validate a number of feed files against a schema definition file, which are both defined at runtime. For each set of feed files, there will be a corresponding schema definition file. For example, the schema definition file looks like below:

 

schema-definition-file.PNG

 

 

 

The feed files to be validated (can come in any of the following extensions - .dat/.data/.csv/.xlsx) have no headers and data in feed are something like:

 

111101020000001|1|
111201030000002|1|
111301030000003|1|
111401030000004|5678|

 

After validation, the expected output would be:

111401030000004|5678|   ---- failed, length exceeded

 

What I want to achieve is something similar to what tComplianceCheck component does. I built a test project using tComplianceCheck component as attached. The difference is that the schema is dynamic, which I am not sure how to do.

 

I have thought about another approach to this problem, which is reading feed files using tFileInputDelimited, get metadata (datatype, length) from Talend and compare this with schema definition.

 

Can you please let me know how this requirement could be achieved in Talend ?

 

 

 

 

 

 

 

 

Moderator

Re: Data validation against dynamic schema

Hello,

So far, tSchemaComplianceCheck component doesn't support for dynamic schema.

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.