I just joined this amazing community so here is my first question.
My task is to migrate data from MSSQL to AS/400. The input data has to be prepared beforehand.
tSchemaComplianceCheck should take care of any schemas or data types and lengths inconsistencies. However, the input data can also deviate from the expected business logic. For example: column1 should contain the values "Apples" or "Oranges" but not "Bananas".
At the moment, I solved this by manually checking the relevant rows for the expected values in a tMap component:
table1.column2== null? "Unexpected Value: Missing column2" : (table1.column1== null || table1.column1.equals("VALUE1"))? table1.column2.toString() : table1.column1.equals("VALUE2")? "SOME_LOGIC" : "Unexpected Value: " + table1.column1
This works out fine but the problem with this is follows:
1. Every column has to be String to hold the error message.
2. Filtering out invalid values with a tFilter Component and String comparison is not optimal (table1.column1.contains("Unexpected"))
3. Valid rows has to be joined again with the input tables (to get the correct column types and to avoid converting back from String) effectively doubling the logic, which is unwanted.
It's not practical to create an expected value List table for each column and use that as a lookup table in the tMap.
Ideally, I would want for rejected rows (one or more columns has an unexpected value) to be written into an extra table with one added column "ErrorMessage" that contains which columns were errorenous.
I feel like this is a common problem, so there should be a better solution to this. Is it with tAssert maybe?
Can't wait to read your answers. Let me know if I should clarify the problem even more.
Thank you in advance.
tAssert component generates the boolean evaluation on the concern for the Job execution status and provides the Job status messages to tAssertCatcher.
Talend recommends using Talend Data Quality first to improve the quality of your data, because incomplete, low quality datasets produce poor analysis results, and data profiling is best done on complete, clean datasets.
Data profiling is about collecting and analyzing information about your data. It is important because it allows you to detect and understand data issues, then fix them.
Let us know if it is what you are looking for?
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Find out about Talend Open Studio for Data Quality
Learn how to enable Data Governance
Take a peek at the definitive guide to Government Data Quality