I have built a process flow to extract from a CSV file - all fields are brought in as strings,
runs through a tConvertType where certain fields are converted to Integers, if they fail the conversion then they are filtered through a tMap and inserted into the Reject table. If there are any rejected records then an email is sent.
Duplicates are then caught using the tUniqRow, and any duplicate records are filtered through a tMap and inserted into the Reject table. If there are any duplicate records then an email is sent.
All valid records are then inserted into the output table.
Although this process works, I am sure it is not the most efficient way to process this data, is anyone able to provide any suggestions on how to clean this up, and perhaps remove some unnecessary components / steps?
tmap should not be the choice for tasks that could be achieved using other ways because it is a complex component carrying
so much options in itself but that comes at cost of performance.
you can use javaflex where map is used if the only requirement is to change data flow for the dboutput component.
You can avoid using tmap over here as you are not doing any filtration or any expression check so instead you can use tjavarow component to process the data..
Introduction to Talend Open Studio for Data Integration.
Practical steps to developing your data integration strategy.
Create systems and workflow to manage clean data ingestion and data transformation.