I have developed around 150 jobs for a big project, most of them are reading information from Oracle and then loading the data into HDFS (Staging Area), now I do not want to continue doing the same thing over and over again.
I am looking for a way to have1 job that does everything since the tasks is repetitive, and then using some sort of metadata to achieve the goal of loading the data to early stages.
If there are no major transformation involved then there are multiple ways to design this job.
Introduction to Talend Open Studio for Data Integration.
Practical steps to developing your data integration strategy.
Create systems and workflow to manage clean data ingestion and data transformation.