best practice to store multiple inputs for a subjob

Five Stars

best practice to store multiple inputs for a subjob

Hey,

 

i have a job with 4 different sources and store each input table in a hash output component and used it later with hash input.

Now, i built subjobs and want to try it the same way but the hash components seems not to be the way it work.

 

I know, i can give data from parent to child job via the tbuffer component. But i have 4 datasets i want to use in my subjob. That don't work with tbuffer, right?

Is there a way besides creating temp files?

 

Thanks for answer :-)

 

Lars

 

thanks for

Employee

Re: best practice to store multiple inputs for a subjob

Hi,

 

    If the data set is same, you can add all the data set to a single buffer output with a code value to determine each dataset. If it is totally different dataset, I would think about the possibility to read directly from DB at subjob rather than reading from the parent job. Or I would keep the data in temp files as it will not overload the memory in this case.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

Five Stars

Re: best practice to store multiple inputs for a subjob

Hi,

thanks for your answer.

I will try the temp file thing.

The reason why i want the input in the parent job is because i use the same input again in every subjob.

There are 6 subjobs (stage 1 to 6) and via the metaservlet in TAC we set a context variable and start the parent job which will run the stage named in context.

At the end, we trigger the ESB, he does stuff and come back call the DI TAC for the next stage.

 

I report my solution.

regards

 

Lars

Nine Stars

Re: best practice to store multiple inputs for a subjob

We can use tHash components, but this occupies the memory, as you have 4 different sources , which means you will use 4 tHashoutput and 4 tHashinput, which will be 8 hash components , there is huge memory occupation if the data is more, it is okay to use if we have less amount of data

Can you please discribe littele bit detail about 4 sources, based on the incoming source we will have some options to reduce memory issues.

 

best way is we can create a folder in the server in all environments, then by using tmap before insertion to target in the basic setting tMap you can find temp data path, But this works with same schema

tmap.PNG

Employee

Re: best practice to store multiple inputs for a subjob

@manishchokkaram 

 

That is the reason I had mentioned both disk options in the form of files or memory options in the form of hash. Based on the processing requirements and available memory, he can choose the preferred method.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

Highlighted
Five Stars

Re: best practice to store multiple inputs for a subjob

Hi,

here are my solution, i create temp files before reading input data.
For each input-source the job creates a temp file:

grafik.png

and store each Path in globalMap (tJava):

globalMap.put(((String)globalMap.get("tForeach_1_CURRENT_VALUE")),((String)globalMap.get("tCreateTemporaryFile_1_FILEPATH")));

To run the different subjobs i used a tRun component with "Use dynamic job". To run a specific stage i used the context.stage variable in "Context Job".
Because the subjobs have to know where the temp data is stored i use the globalMap variables with the tempfile path as context parameter.

grafik.png

If the subjob is successful, the stage-file will be uploaded to a FTP Server and the ESB get a JMS Message with the information where the file to be found.

 

thanks for your help :-)

kind regards

 

Lars

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

6 Ways to Start Utilizing Machine Learning with Amazon We Services and Talend

Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend

Blog