Four Stars

Talend jobs iteration execution

Hi all,
We have created a big data standard job in which it ll read from hive and load to hive.
In thiveinput component we have select query, which ll fetch from 1 table. The actual scenario is there are 100's of tables with same column names, so we have created a context variable and passed the table names as values. Is there any other way in which the job ll fetch only the table names which are stored in a file or a table and the job can be executed in a loop. Example: for 1 table name if the extraction is complete, it needs to be automatically fetching the another table name which is stored in a file or table.
My current job design is
Thiveinput->tmap->thdfsoutput->thiveload
Talend version-talend data fabric 6.3 enterprise edition.

Thanks
SS
  • Big Data
  • Data Integration
1 REPLY
Six Stars

Re: Talend jobs iteration execution

You can maintain a table that stores table names that you want to load.

If you put a tHiveRow with a select from a table containing all the tables, you can first query that then pass it through a tFlowToIterate which will hook into the code you have developed so far.

 

In my example table tt1 has a list of the tables I want to query from.

select a from tt1

returns my list of tables, I pass that flow into a tFlowToIterate component then link via an iterate to the second tHiveInput (currently this is your first tHiveInput). All you need inside this query is the row+tableName that you have for the newly tHiveInput component.

"select * from " + row4.tableName

This will iteratively run that query on every table contained in the first table you query.  At this point you can then hook the output into your tMap etc...This will only work if your tables have matching schemas