Four Stars

Generic Data Acquisition Framework

Hi Experts,

 

I come from Abinitio ETL Background and now trying to switch to Talend. I have a requirement to build a generic Data Acquisition Framework. These are the functionality that I want to implement and I am looking for your expert advise on how to implement it in Talend.

 

1. Framework should be able to read different files with different no of columns. (Completed)

2. Framework should be able to enrich (derived columns, cleansing, transforming data) different files.

3. Framework should be able to use multiple lookups (used for derived columns/transforming data)

4. Framework should be able to load the cleansed data to Oracle tables.

 

example:

1. Say I have file abc.dat and xyz.dat

abc.dat layout:

Field1, Field2, Field3...., Field15

xyz.dat layout 

Field16, Field 17,..... Field99

2.For abc.dat, I have to derive 10 new fields from it's columns and For xyz.dat I have to derive 30 new fields from it's columns. There is no similarity in the derivation logic for derived fields.

3. For deriving fields of abc.dat, I might have to use 5 lookup files/tables and for xyz.dat I might have to use 10 lookup files/tables.

4. The enriched data of abc.dat will be loaded to abc table and the enriched data of xyz.dat will be loaded to xyz table.

 

Any solution approach is highly appreciated.