parallel executions

One Star

parallel executions

I need to read data from 4 tables , each hosted at 4 separate DB's. I want this to happen parallely ??
1. Should i create 4 DB connections for ech DB and then run 4 separate jobs ? Bad option!!
2. Does context help me somewhere.

It would be good if i can have just one job and configure it to run for different DB connections parallely??
One Star

Re: parallel executions

Hi aviator,
do you have to melt data from this 4 tables or are you going to make 4 different jobs?
bye
One Star

Re: parallel executions

There is no general answer to your question with only the information you provided.
If you like to process the data from the 4 sources in one flow then there is no possibility to run the operation in parallel (but they will run semi parallel ? interleaved).
If you retrieve the data, cache it somewhere and process it afterwards, the data fetch can be done in parallel with 4 different unconnected flows in one job (if you enable the Multi thread execution).
You need to keep in mind that accessing the global variables and sub job execution is synchronized and limit the possibilities how much you can multi thread and parallelize without running totally independent processes.
To understand what you are looking for, you need to provide some additional details on what you like to process and what you expect from parallel.
Employee

Re: parallel executions

You can use the 2.4 new feature "iterate parallel" (see an example on my blog : Parallel executions on iterate links). Describe your database connections in a text file : 1 connection = 1 line, tFileInputDelimited --row--> tFlowToIterate --iterate--> tOracleInput --row--> tFileOutputDelimited for example.
You need to keep in mind that accessing the global variables and sub job execution is synchronized and limit the possibilities how much you can multi thread and parallelize without running totally independent processes.

It's not deeply documented yet, but what's true for Java is not for Perl. As described in another post on my blog multithreading for Perl jobs, Java uses threads for parallelization while Perl uses processes. As a consequence : Perl parallel subjobs share nothing, it avoids problems but also brings another kind of limitations.
One Star

Re: parallel executions

How would you get the data streams together afterwards?
If I have to cache the results in a file I can do it more simple with one component by connection, I think.
Maybe Perl is doing much better with parallel iterative links. With Java I?m really struggling hard, I?ve created some bug reports and got interesting feedback about limitations already.
Employee

Re: parallel executions

How would you get the data streams together afterwards?

You have to use a concurrency compatible kind of output, a database for example. But not a single file. You often don't need to merge the streams, flows are most of the time independant when talking about iterate links (let's say you have 1000 files to load in a database table, or 1000 files to read, transform and output 1000 corresponding files).
One Star

Re: parallel executions

It would be really nice to see some praxis relevant examples in a blog or an article (and not just about parallelism). I checked you link; it is very brief and straight forward. Smiley Happy
I think TOS has lots of potential but it is sometimes difficult to have the right idea to solve issues efficient. Especially when different aspects come together and the jobs get a bit more complicated.
For your example: if I have a list of files and need some additional information from the database, can I use an additional iterate link inside the parallelism?
If I do the transformation with a tMap inside the parallelism, will the lookup loaded several times?
Would Java and Perl behave different for these two cases?
I?m usually spending hours to try out several scenarios to answer above questions and I may not be able to even explore all possibilities because I just don?t get the right idea how to do it best. Browsing through some best practises would be extreme helpful, I think.
One Star

Re: parallel executions

My problem statement:
I have a job A , which reads data from a input database D1 and does some aggregation (in a subjob)and outputs the data in another table in Datawarehouse-1
Now this job has to be performed for different input databases(D1,D2,D3,D4) , and aggregated data outputted in the same DatawareHouse-1.
Question:
Do i need to write the same job 4 times , configuring my jobs for each of the input DB's ?
And then have 4 tRunJob's in a separate job , pointing to each of the 4 above jobs created , and enable the multithreading option so that they can run paralelly?
(thats a pretty easy and obvious option)
OR
Can i configure the same job for 4 different input DB's , make them run parallely?
Am i understandable?
One Star

Re: parallel executions

Also if i have 4 tRunJobs like in the image :
multithreading option is ON
does it mean that they are running parallely.
I hope thats not a stupid Q Smiley Sad
One Star

Re: parallel executions

The answer to your question depends on the language you choose for your jobs.
With Perl you can implement it in the way plegall described and you would not need to handle the connection separate. It works in Perl because it will run in separate processes for the different iterations. I?ve never done a job in Perl but I think plegall knows very well how it works.
If you select Java, you can call the same Child Job from different threads but they will not run in parallel because they are synchronized.
If you build 4 complete separate jobs they will run in parallel (access to global variables is still synchronized between the jobs therefore you can not expect 4 times the performance).
The jobs in your picture of your other post will run in parallel (3 threads) if:
A: the language you use is Perl
B: your language is Java and all 6 jobs are different. If any job is shared, it will sequence the execution of this job and the rest will still run parallel (with exception of the synchronized sections like the global variable access).
The question is not stupid, I spend hours trying to find out what works better and what does not work at all.
If you like to have maximum parallelism in Java: build you job for one data source. Put the db configuration into a context and execute the java cmd with the different contexts 4 times. This should rarely be necessary, I think.
The choice is up to you. You have several options.
One Star

Re: parallel executions

thanks Vaiko!
will try and post other queries if they come up !
One Star

Re: parallel executions

Vaiko: "If you like to have maximum parallelism in Java: build you job for one data source. Put the db configuration into a context and execute the java cmd with the different contexts 4 times. This should rarely be necessary, I think."

How can i Execute a job with different contexts 4 times ? using something like tContextLoad ??? but each context wud be loaded one by one , and then the job will be sequential for each context .... wud'nt it be ?
better would be , to create 4 jobs , each accessing a different DB(each table in the different DB's have a similar schema) , and then make them run parallely ?

wat say ?
One Star

Re: parallel executions

Link you can use as entry point for other posts about the 3864
You can export a job and run it from the command line. If you check the option the export will automatically create the batch file. You need to edit the batch file to include: --context_param xxx=yyy
xxx is the context variable
yyy is the new value
Excample:
java -Xms256M -Xmx1024M -cp ../lib/systemRoutines.jar;../lib/userRoutines.jar;.;myfirstjob.jar;../lib; test.myfirstjob.MyFirstJob --context=Default --context_param fileName1=adsfakf --context_param fileName2=sljdafslhf

If you defined you connection parameter in a contexts variables you can set them at runtime.
How you call multiple times the batch parallel depends on your operating system.
One Star

Re: parallel executions

its like this : image attached
my tmysql_input 1 is to be changed . I have 4 DB's from which input shud be read parallely....
All the rest mysql inputs and outputs are same. So its like , the context is not specific to a job but just a DB.... how can i do this , such that input can be taken from all the DB's and 4 instances of the same job run parallely????
One Star

Re: parallel executions

Java or Perl?
One Star

Re: parallel executions

Java