Duplicate and Rejected Records - Is there a better way?

Five Stars

Duplicate and Rejected Records - Is there a better way?

Hi,

 

I have built a process flow to extract from a CSV file - all fields are brought in as strings,

runs through a tConvertType where certain fields are converted to Integers, if they fail the conversion then they are filtered through a tMap and inserted into the Reject table. If there are any rejected records then an email is sent.

Duplicates are then caught using the tUniqRow, and any duplicate records are filtered through a tMap and inserted into the Reject table. If there are any duplicate records then an email is sent.

All valid records are then inserted into the output table.

 

Although this process works, I am sure it is not the most efficient way to process this data, is anyone able to provide any suggestions on how to clean this up, and perhaps remove some unnecessary components / steps?

 

ExtractLoad.png

 

Seven Stars

Re: Duplicate and Rejected Records - Is there a better way?

Hi ,

tmap should not be the choice for tasks that could be achieved using other ways because it is a complex component carrying

so much options in itself but that comes at cost of performance.

you can use javaflex where map is used if the only requirement is to change data flow for the dboutput component.

 

 

 

Regards 

Chandra Kant

Seven Stars

Re: Duplicate and Rejected Records - Is there a better way?

Hello,

 

You can avoid using tmap over here as you are not doing any filtration or any expression check so instead you can use tjavarow component to process the data..

 

Regards

Ganshyam Patel

Tutorial

Introduction to Talend Open Studio for Data Integration.

Definitive Guide to Data Integration

Practical steps to developing your data integration strategy.

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.