One key element for designing a successful data integration (DI) Job is being able to capture errors that occur because of invalid data. When an error occurs, you may see incomplete or invalid data, though you cannot easily identify issues with data causing this behavior.
Whenever you get incomplete or invalid data in output, it is caused by one of these reasons:
Here are solutions for capturing invalid data at various stages of the DI process.
For data coming from sources that are not compliant with the source metadata or schema, you can use a Reject row from the Input component, as shown below:
In this example, one of the records has an invalid value in the city_id (Integer) column, and it was sent to the RejectedCities output. Notice that for the Reject row to be enabled, the Die on error property check box for the Input component must be clear. For relational database sources, you can use validation rules as outlined in the documentation.
The second scenario is when incoming records fail your data quality checks. For example, you may need to validate multiple columns against regular expressions. You can leverage Talend Data Quality components to see whether they match required patterns. These components provide two output flows: matching data and non-matching data. You can redirect non-matching records to a file or table and inspect why they failed.
You can also review Talend documentation for examples using other components in this family, including tSchemaComplianceCheck.
If your validation check is based on a list of values available in another lookup table, you can use a tMap component to join data with the lookup data and set your join model to Inner Join. All records that don’t have a match can be captured in a new output in tMap whose Catch lookup inner join reject property is set to true.
If you are trying to insert or update data in the target database and the data violates any integrity constraints set by the database, you can use the Reject flow from the Output component to capture the records in error. Here’s an example of Retrieving data in error with a Reject link from Talend documentation. Depending on the Output component you use, you may need to disable its ability to process rows in batch. For instance, with tOracleOutput, you must clear the Use Batch Size check box.