One Star

Continuous run of talend jobs raising memory issues

Hi,
I have designed around 6 talend jobs to process 6 different files and files size is around 1 GB.
I builded those jobs and place it is a java program, such that they run continuously everyday at specific time.
I am making all these as a jar file and deployed in server and started this jar to run and all are running perfectly.
At first time when all jobs was run, the java process pertaining to this is taking only 1.5 GB overall. 
But as jobs are  running continuously once every day memory is summing up and after 5 days i see maximum memory it is taking is around 7.5 GB and due to this i am getting heap space error.
My doubt is if we build talend jobs as java jobs and make them run daily as a process, will memory be released after job has been finished?
11 REPLIES
Fifteen Stars

Re: Continuous run of talend jobs raising memory issues

Talend DI jobs run in their own Java virtual machine which is completely shutdown at the end of the job. The only things that can be causing this is if the JVM is not shutting down (can you check your processes?) or that the files you are processing are getting larger for some reason. 
Can you be sure of when the jobs start and finish?
Can you be sure that the workload of the jobs is remaining the same?
Rilhia Solutions
One Star

Re: Continuous run of talend jobs raising memory issues

Hi,
Thanks for your reply.
JVM won't shut down as it is a continuous thread(poller) starts at 23:00 everyday and runs all jobs.
Workload of jobs is same. Almost same size of files are being given as input daily.
Fifteen Stars

Re: Continuous run of talend jobs raising memory issues

Are you saying that you have a single Talend Job with a tLoop (or something similar) calling your other Talend Jobs periodically throughout the day? If so, that is not a good design. You need to refactor this. Talend Jobs need to start and end. If you are wanting something that remains memory resident all day I suggest you look at doing this with an ESB Route. Alternatively, is there any reason why the jobs cannot be scheduled to start periodically using another tool?
Rilhia Solutions
One Star

Re: Continuous run of talend jobs raising memory issues

Sorry I don't mean talend jobs won't end
Talend jobs will end after completion of their processing of files. 
The thread which initiates this running of jobs won't end. It will be continuously running and whenever the input files are ready it will trigger talend jobs.
Fifteen Stars

Re: Continuous run of talend jobs raising memory issues

OK. Is it that thread that is using the memory? If so, this isn't a Talend issue. The Talend jobs will have their own JVMs. If they start and stop then the JVMs will start and stop. When the JVM is stopped it will not be keeping any memory tied up. If you are seeing a lot of memory being used constantly, then that is not Talend.
Rilhia Solutions
One Star

Re: Continuous run of talend jobs raising memory issues

Correct me if am wrong.
If Talend jobs have their own JVMs then why this thread is showing as it is consuming memory as 1.5 GB when talend jobs were running. Below is the sample code in which i can explain you better
public void startJobs() {
         Talendjob1 job1 = new Talendjob1();
         job1.runJobInTOS();
         Talendjob2 job2 = new Talendjob2();
         job2.runJobInTOS();
         Talendjob3 job3 = new Talendjob3();
         job3.runJobInTOS();
         Talendjob4 job4 = new Talendjob4();
         job4.runJobInTOS();
         Talendjob5 job5 = new Talendjob5();
         job5.runJobInTOS();
}

The above method will be called every day at specified time. And a continuous running thread is there which invokes this method at specified time.
Fifteen Stars

Re: Continuous run of talend jobs raising memory issues

Ah, I see. I would strongly urge you to change this. This is never going to be memory efficient. Java is not great at garbage collection and may need some help if you keep doing this. Why has this way been chosen? You are essentially forcing 5 jobs to share the same JVM with no benefits (unless there is a benefit that you have not specified). I would either use another scheduling tool or enable these jobs to be started with their own JVM. 
Maybe something like below would do it....
Export your jobs as Standalone Jobs and then use the supplied batch file to start them. 
Runtime.getRuntime().exec("cmd /c start jobname_run.bat");

This way you are not going to tie up a load of memory.
This problem is a not a Talend problem, it is a Java *feature*. You may be able to get round it using the below manually calling the garbage collector after your jobs have fininshed....
System.gc();

....but it is not an ideal solution.
Rilhia Solutions
One Star

Re: Continuous run of talend jobs raising memory issues

The reason for choosing that way is i am externally sending input files to the job. So i am sending file path as input like
String[] context = new String[] {"--context_param file="+filepath};
int exitCode = job2.runJobInTOS(context);
I have a doubt in this phrase 
" You are essentially forcing 5 jobs to share the same JVM"
All jobs are sequential and executes one after the other and not at a time. In this scenario only one job will be running in JVM isn't it?. Or after one job finishes, GC not doing its job is what you mean right?
Fifteen Stars

Re: Continuous run of talend jobs raising memory issues

Each of the jobs are instantiated objects inside your Java application. They are not immediately destroyed once they have finished. So you have 5 jobs represented as objects in your Java app, that are present at the same time and holding memory. Regardless of that, they are all running in the same JVM whether they run together or at separate times.
The supplying of parameters to the job can be done using the method I suggested. Take a look at the batch files that Talend produce for you when you export a standalone job. Alternatively, you could build the jobs so that they lookup their context variables when they start running (checkout tContextLoad). 
Rilhia Solutions
One Star

Re: Continuous run of talend jobs raising memory issues

Hi,
Is there a way to destroy all objects (Map, Lists, Input/Output Streams etc.,) that are acquired by a particular job after job finishes?
Fifteen Stars

Re: Continuous run of talend jobs raising memory issues

Basically, no. You can call the garbage collector but that is not guaranteed to do anything, it is up to the JVM. You can try putting the garbage collector call in between your jobs to see what that does....
System.gc();
Rilhia Solutions