best practice to store multiple inputs for a subjob

Five Stars

best practice to store multiple inputs for a subjob

Hey,

 

i have a job with 4 different sources and store each input table in a hash output component and used it later with hash input.

Now, i built subjobs and want to try it the same way but the hash components seems not to be the way it work.

 

I know, i can give data from parent to child job via the tbuffer component. But i have 4 datasets i want to use in my subjob. That don't work with tbuffer, right?

Is there a way besides creating temp files?

 

Thanks for answer :-)

 

Lars

 

thanks for

Employee

Re: best practice to store multiple inputs for a subjob

Hi,

 

    If the data set is same, you can add all the data set to a single buffer output with a code value to determine each dataset. If it is totally different dataset, I would think about the possibility to read directly from DB at subjob rather than reading from the parent job. Or I would keep the data in temp files as it will not overload the memory in this case.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)


Warm Regards,
Nikhil Thampi
Please appreciate our members by giving Kudos for spending their time for your query. If your query is answered, please mark the topic as resolved :-)
Highlighted
Five Stars

Re: best practice to store multiple inputs for a subjob

Hi,

thanks for your answer.

I will try the temp file thing.

The reason why i want the input in the parent job is because i use the same input again in every subjob.

There are 6 subjobs (stage 1 to 6) and via the metaservlet in TAC we set a context variable and start the parent job which will run the stage named in context.

At the end, we trigger the ESB, he does stuff and come back call the DI TAC for the next stage.

 

I report my solution.

regards

 

Lars

Eight Stars

Re: best practice to store multiple inputs for a subjob

We can use tHash components, but this occupies the memory, as you have 4 different sources , which means you will use 4 tHashoutput and 4 tHashinput, which will be 8 hash components , there is huge memory occupation if the data is more, it is okay to use if we have less amount of data

Can you please discribe littele bit detail about 4 sources, based on the incoming source we will have some options to reduce memory issues.

 

best way is we can create a folder in the server in all environments, then by using tmap before insertion to target in the basic setting tMap you can find temp data path, But this works with same schema

tmap.PNG

Warm Regards,
Please don't forget to give Kudos if it resolves issue, and if you think its Apt you can also mark its as solution.
Employee

Re: best practice to store multiple inputs for a subjob

@manishchokkaram 

 

That is the reason I had mentioned both disk options in the form of files or memory options in the form of hash. Based on the processing requirements and available memory, he can choose the preferred method.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)


Warm Regards,
Nikhil Thampi
Please appreciate our members by giving Kudos for spending their time for your query. If your query is answered, please mark the topic as resolved :-)
Five Stars

Re: best practice to store multiple inputs for a subjob

Hi,

here are my solution, i create temp files before reading input data.
For each input-source the job creates a temp file:

grafik.png

and store each Path in globalMap (tJava):

globalMap.put(((String)globalMap.get("tForeach_1_CURRENT_VALUE")),((String)globalMap.get("tCreateTemporaryFile_1_FILEPATH")));

To run the different subjobs i used a tRun component with "Use dynamic job". To run a specific stage i used the context.stage variable in "Context Job".
Because the subjobs have to know where the temp data is stored i use the globalMap variables with the tempfile path as context parameter.

grafik.png

If the subjob is successful, the stage-file will be uploaded to a FTP Server and the ESB get a JMS Message with the information where the file to be found.

 

thanks for your help :-)

kind regards

 

Lars

What’s New for Talend Spring ’19

Watch the recorded webinar!

Watch Now

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.

Download

Tutorial

Introduction to Talend Open Studio for Data Integration.

Watch

Downloads and Trials

Test drive Talend's enterprise products.

Downloads