Java heap space problem in Talend Open Studio 4.0.1

Java heap space problem in Talend Open Studio 4.0.1

Hi there,
I'm a newbie to this forum and Talend in general. I run the above version of Talend in Windows 2007. I use a number of jobs which were written by some contractors we hired about 9 months ago. One of these jobs, Strength01, has been failing with the following messages:-
xception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuilder.append(Unknown Source)
at com.csvreader.CsvReader.readRecord(CsvReader.java:1036)
at core_extract.strengthsub02_1_0.StrengthSub02.tFileList_1Process(StrengthSub02.java:12235)
at core_extract.strengthsub02_1_0.StrengthSub02.tHashInput_tUnite_1Process(StrengthSub02.java:16739)
at core_extract.strengthsub02_1_0.StrengthSub02.runJobInTOS(StrengthSub02.java:17696)
at core_extract.strengthsub02_1_0.StrengthSub02.runJob(StrengthSub02.java:17577)
at core_extract.strength01_1_0.Strength01.tRunJob_1Process(Strength01.java:413)
at core_extract.strength01_1_0.Strength01.runJobInTOS(Strength01.java:652)
at core_extract.strength01_1_0.Strength01.main(Strength01.java:526)

From the information I've gleaned so far from looking this up on the Internet, it is suggested that I edit the file:-
TalendOpenStudio-win32-x86.ini
which contains the line
-vmargs -Xms64m- Xmx768m -XX:MaxPermSize=600m

and change the number following "MaxPermSize". I've tried a number of values, with 600 being the latest one, but the problem still occurs.
On the network where I run Talend we have two PC's that have Talend installed. I got the same error on the second PC. Is the problem related to the ini file on the PC(s) or could it be related to our server?
Am I missing anything?
Is there anything else I should do?
Thank you in anticipation,
Richard

Re: Java heap space problem in Talend Open Studio 4.0.1

could you tell us how big are the files you are loading?
it seems it is falling when loading some CSVs.
*note: i suppose you can always call those contractors and ask for them to fix it.
One Star

Re: Java heap space problem in Talend Open Studio 4.0.1

The argument you want to adjust for the heap is -Xmx, try increasing this to 1024m. MaxPermSpace of 128m should be fine.
Use trial-and-error to see if you can find a value that will get your map to run.
It looks like there's a custom CSV reader class that may be reading the entire input file in memory. That's fine for a moderate-sized document (100-200k), but not if the file is large. Can the custom CSV class be replaced with a tFileInputDelimited? That way, the input is processed line-by-line and the overall memory doesn't need to exceed that required for a single row.

Re: Java heap space problem in Talend Open Studio 4.0.1

The files I'm loading have are big - up to 45 megabytes.

Re: Java heap space problem in Talend Open Studio 4.0.1

The argument you want to adjust for the heap is -Xmx, try increasing this to 1024m. MaxPermSpace of 128m should be fine.
Use trial-and-error to see if you can find a value that will get your map to run..

I'll give this a try and let you know how I get on
It looks like there's a custom CSV reader class that may be reading the entire input file in memory. That's fine for a moderate-sized document (100-200k), but not if the file is large. Can the custom CSV class be replaced with a tFileInputDelimited? That way, the input is processed line-by-line and the overall memory doesn't need to exceed that required for a single row.

I'll check this out too.
Thanks very much walkerca, I'll let you know how I get on.
Seven Stars

Re: Java heap space problem in Talend Open Studio 4.0.1

The -Xmx argument in the .ini file controls the memory usage of the studio itself and thus no further than building a job. It makes no difference to the actual running of the job. The memory allocated to running a job is controlled by default through Window > Preferences > Talend > Run/Debug or for specific jobs under JVM arguments on the left side of the Run tab.
Six Stars

Re: Java heap space problem in Talend Open Studio 4.0.1

The job fails because StringBuilder object is trying to expand its backing array due to a very big record...
tFileInputDelimited with CSV options uses third party "com.csvreader.CsvReader" under the hood... so is possible that you are using it already... because is present in the stack trace...
you should post the job and example data if you want more optimization insights...

Re: Java heap space problem in Talend Open Studio 4.0.1

The -Xmx argument in the .ini file controls the memory usage of the studio itself and thus no further than building a job. It makes no difference to the actual running of the job. The memory allocated to running a job is controlled by default through Window > Preferences > Talend > Run/Debug or for specific jobs under JVM arguments on the left side of the Run tab.

Thanks for this.
I've tried increasing the -Xmx by doing Window > Preferences > Talend > Run/Debug but when I increase Xmx from 1024 to 2048 I get the message:-
Could not create the Java virtual machine.
Error occurred during initialization of VM
Could not reserve enough space for object heap
Job Strength01 ended at 09:56 18/05/2011.
Can anyone suggest how I can find out what the maximum value I can set -Xmx to?
Thanks
One Star

Re: Java heap space problem in Talend Open Studio 4.0.1

Could be you've described the data structure incorrectly.

Re: Java heap space problem in Talend Open Studio 4.0.1

The job fails because StringBuilder object is trying to expand its backing array due to a very big record...
tFileInputDelimited with CSV options uses third party "com.csvreader.CsvReader" under the hood... so is possible that you are using it already... because is present in the stack trace...
you should post the job and example data if you want more optimization insights...

Thanks for your comments. I'm afraid I can't post sample data because it it sensitive but I can post details of the job. Can you clarify what particular details please?

Re: Java heap space problem in Talend Open Studio 4.0.1

Could be you've described the data structure incorrectly.

I don't think so because it's not a new job. It's been running quite happily for the last 9 months and has not been changed. Thanks for your input though anyway.
One Star

Re: Java heap space problem in Talend Open Studio 4.0.1

Has the data in the file changed? Is there more data or is the data missing a delimiter that would cause the entire file to be treated as a single row? Extra quotes maybe?

Re: Java heap space problem in Talend Open Studio 4.0.1

Anyway. here's the code:

/**
* start
*/

currentComponent="tRunJob_1";


java.util.List<String> paraList_tRunJob_1 = new java.util.ArrayList<String>();
paraList_tRunJob_1.add("--father_pid="+pid);
paraList_tRunJob_1.add("--root_pid="+rootPid);
paraList_tRunJob_1.add("--father_node=tRunJob_1");
paraList_tRunJob_1.add("--context=Default");
//for 10589
paraList_tRunJob_1.add("--stat_port=" + portStats);
if(resuming_logs_dir_path != null){
paraList_tRunJob_1.add("--resuming_logs_dir_path=" + resuming_logs_dir_path);
}
String childResumePath_tRunJob_1 = ResumeUtil.getChildJobCheckPointPath(resuming_checkpoint_path);
String tRunJobName_tRunJob_1 = ResumeUtil.getRighttRunJob(resuming_checkpoint_path);
if("tRunJob_1".equals(tRunJobName_tRunJob_1) && childResumePath_tRunJob_1 != null){
paraList_tRunJob_1.add("--resuming_checkpoint_path=" + ResumeUtil.getChildJobCheckPointPath(resuming_checkpoint_path));
}
paraList_tRunJob_1.add("--parent_part_launcher=JOB:" + jobName + "/NODE:tRunJob_1");
java.util.Map<String, Object> parentContextMap_tRunJob_1 = new java.util.HashMap<String, Object>();
core_extract.strengthsub01_1_0.StrengthSub01 childJob_tRunJob_1 = new core_extract.strengthsub01_1_0.StrengthSub01();
childJob_tRunJob_1.parentContextMap = parentContextMap_tRunJob_1;
String[][] childReturn_tRunJob_1 = childJob_tRunJob_1.runJob((String[]) paraList_tRunJob_1.toArray(new String));
errorCode = childJob_tRunJob_1.getErrorCode();

if(childJob_tRunJob_1.getErrorCode() == null){
globalMap.put("tRunJob_1_CHILD_RETURN_CODE", childJob_tRunJob_1.getStatus() != null && ("failure").equals(childJob_tRunJob_1.getStatus()) ? 1 : 0);
}else{
globalMap.put("tRunJob_1_CHILD_RETURN_CODE", childJob_tRunJob_1.getErrorCode());
}
globalMap.put("tRunJob_1_CHILD_EXCEPTION_STACKTRACE", childJob_tRunJob_1.getExceptionStackTrace());

if (childJob_tRunJob_1.getErrorCode() != null || ("failure").equals(childJob_tRunJob_1.getStatus())) {
throw new RuntimeException("Child job running failed");
}
for (String[] item_tRunJob_1 : childReturn_tRunJob_1) {

tos_count_tRunJob_1++;
/**
* stop
*/

Meanwhile I've now had a chance to try the other Talend jobs I've been running and they all an OK.
Six Stars

Re: Java heap space problem in Talend Open Studio 4.0.1

How this kind of snippet could help? it is totally unrelated to the problem.

try to run the job in debug-trace mode in order to see the data flow... maybe the problem lies in data...

Re: Java heap space problem in Talend Open Studio 4.0.1

How this kind of snippet could help? it is totally unrelated to the problem.

try to run the job in debug-trace mode in order to see the data flow... maybe the problem lies in data...

OK I'll try that and come back if I get stuck.
Thanks.
One Star

Re: Java heap space problem in Talend Open Studio 4.0.1

If your input data is a delimited file try opening it in Excel to see if there are any problems with the delimiter eg if delimiter is ; text may contain this value.

Re: Java heap space problem in Talend Open Studio 4.0.1

Curiouser and Curiouser! As one of the contractors who wrote the job last year just happened to be around (they're working on another project here at the moment) he asked me to show him the job failing.
So I ran it again - and it worked! I did this a second time - same result.
So I'm going to leave it for now and see what happens when I need to run again next month.
Thank you all for your help.
One Star

Re: Java heap space problem in Talend Open Studio 4.0.1

Was the data EXACTLY the same?
Seven Stars

Re: Java heap space problem in Talend Open Studio 4.0.1

Can anyone suggest how I can find out what the maximum value I can set -Xmx to?

Trial and error only really. Note that the maximum you can reserve for the job depends on how much has already been taken by other apps, including the studio itself. So close as much as you can and reduce the -Xmx in the .ini file.