I have a job where I get a file with a number of records (each row contains a user with each column being phone number, email address, postcode, etc. of the user) and I want to load these records into a database. Before I do it, I have created a flow that has certain data cleaning components (tfilterrow, tpatterncheck...) that make sure the records follow certain rules (name without invalid chars, name longer than 2 chars, postcode no overseas...). Otherwise, these records get 'flagged', that is, they are added to the 'reject' flows of the different data cleaning components.
What I am trying to find out is the best way of dealing with these 'flagged' records. When the job finishes running, I would like to manually analyse these rejects (all rows with names with invalid characters, names longer than 2 chars, etc.) and decide if they are actually rejects instead of false positives (names like 'Jo' are still valid, for example) or even correct them (when they have invalid characters). Once these corrections are made, I would like to add these records to the flow with the 'unflagged' records so that they get added to the database as well.
This can be achieved with an excel file in a quite manual process(get rejects from each component in one tab of an excel file and correct them manually), but I was thinking if there could be some more effective way of doing this. I have tried using Talend Stewardship Console, but given that it is designed for working with Matching rejections it doesnt really work.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Pick up some tips and tricks with Context Variables
Learn how media organizations have achieved success with Data Integration
Accelerate your data lake projects with an agile approach