I come from Abinitio ETL Background and now trying to switch to Talend. I have a requirement to build a generic Data Acquisition Framework. These are the functionality that I want to implement and I am looking for your expert advise on how to implement it in Talend.
1. Framework should be able to read different files with different no of columns. (Completed)
2. Framework should be able to enrich (derived columns, cleansing, transforming data) different files.
3. Framework should be able to use multiple lookups (used for derived columns/transforming data)
4. Framework should be able to load the cleansed data to Oracle tables.
1. Say I have file abc.dat and xyz.dat
Field1, Field2, Field3...., Field15
Field16, Field 17,..... Field99
2.For abc.dat, I have to derive 10 new fields from it's columns and For xyz.dat I have to derive 30 new fields from it's columns. There is no similarity in the derivation logic for derived fields.
3. For deriving fields of abc.dat, I might have to use 5 lookup files/tables and for xyz.dat I might have to use 10 lookup files/tables.
4. The enriched data of abc.dat will be loaded to abc table and the enriched data of xyz.dat will be loaded to xyz table.
Any solution approach is highly appreciated.
Introduction to Talend Open Studio for Data Integration.
Practical steps to developing your data integration strategy.
Create systems and workflow to manage clean data ingestion and data transformation.