A more elegant way for data quality check (data assertion)?

Highlighted
Four Stars

A more elegant way for data quality check (data assertion)?

Hey guys,

I just joined this amazing community so here is my first question.

 

My task is to migrate data from MSSQL to AS/400. The input data has to be prepared beforehand.

tSchemaComplianceCheck should take care of any schemas or data types and lengths inconsistencies. However, the input data can also deviate from the expected business logic. For example: column1 should contain the values "Apples" or "Oranges" but not "Bananas".

At the moment, I solved this by manually checking the relevant rows for the expected values in a tMap component:

 

Example:

table1.column2== null? 
"Unexpected Value: Missing column2"
:
(table1.column1== null || table1.column1.equals("VALUE1"))?
table1.column2.toString()
:
table1.column1.equals("VALUE2")?
"SOME_LOGIC"
:
"Unexpected Value: " + table1.column1

This works out fine but the problem with this is follows:

1. Every column has to be String to hold the error message.

2. Filtering out invalid values with a tFilter Component and String comparison is not optimal (table1.column1.contains("Unexpected"))

3. Valid rows has to be joined again with the input tables (to get the correct column types and to avoid converting back from String) effectively doubling the logic, which is unwanted.

 

It's not practical to create an expected value List table for each column and use that as a lookup table in the tMap.

Ideally, I would want for rejected rows (one or more columns has an unexpected value) to be written into an extra table with one added column "ErrorMessage" that contains which columns were errorenous.

 

I feel like this is a common problem, so there should be a better solution to this. Is it with tAssert maybe?

 

Can't wait to read your answers. Let me know if I should clarify the problem even more.

 

Thank you in advance.

 

Best Regards,

Houssam

 

Moderator

Re: A more elegant way for data quality check (data assertion)?

Hello,

tAssert component generates the boolean evaluation on the concern for the Job execution status and provides the Job status messages to tAssertCatcher.

Here are related scenarios:TalendHelpCenterSmiley Frustratedetting up the assertive condition for a Job execution  and TalendHelpCenter:Viewing product orders status (on a daily basis) against a benchmark number 

Talend recommends using Talend Data Quality first to improve the quality of your data, because incomplete, low quality datasets produce poor analysis results, and data profiling is best done on complete, clean datasets.

Data profiling is about collecting and analyzing information about your data. It is important because it allows you to detect and understand data issues, then fix them.

Let us know if it is what you are looking for?

https://help.talend.com/reader/Cau_yL8zEBuz1qNJ95YqQw/KoalefBU0uSZmuVLiKSmag 

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Introduction to Talend Open Studio for Data Quality

Find out about Talend Open Studio for Data Quality

Watch Now

Enabling Data Governance

Learn how to enable Data Governance

Watch Now

The Definitive Guide to Government Data Quality

Take a peek at the definitive guide to Government Data Quality

Read