I need to validate data which come from csv files against dynamic schema and I am using Talend 5.6 Enterprise version. I read a number of posts on Talend Community and found that tComplianceCheck cannot be used in this case as it does not support dynamic schema. Can you please advise how this can be done in Talend ?
My initial approach is to create 2 separate jobs: e.g. Job1 - create schema dynamically from a schema definition file (csv), then pass this schema definition to Job2, which will loop through a number of csv files, read each one and compare/validate each record against the schema defined in Job1. My question is how I can pass a dynamic schema to Job2 to do the validation ? Is it a feasible solution ?
The problem is the schema definitions are not known in advance so I cannot create it when designing the job. Users of my application would define the schema using a schema definition file (could be csv/excel) and provide the csv files to be validated against that particular schema at runtime.
Look forward to your suggestions/advice on this topic.
Could you please elaborate your case with an example with input and expected output values?
Thanks for your reply. I need to validate a number of feed files against a schema definition file, which are both defined at runtime. For each set of feed files, there will be a corresponding schema definition file. For example, the schema definition file looks like below:
The feed files to be validated (can come in any of the following extensions - .dat/.data/.csv/.xlsx) have no headers and data in feed are something like:
After validation, the expected output would be:
111401030000004|5678| ---- failed, length exceeded
What I want to achieve is something similar to what tComplianceCheck component does. I built a test project using tComplianceCheck component as attached. The difference is that the schema is dynamic, which I am not sure how to do.
I have thought about another approach to this problem, which is reading feed files using tFileInputDelimited, get metadata (datatype, length) from Talend and compare this with schema definition.
Can you please let me know how this requirement could be achieved in Talend ?
So far, tSchemaComplianceCheck component doesn't support for dynamic schema.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Pick up some tips and tricks with Context Variables
Learn how media organizations have achieved success with Data Integration
Practical steps to developing your data integration strategy.