OutOfMemoryError: Java heap space

One Star

OutOfMemoryError: Java heap space

Hello,
At the creation of an XML file (tAdvancedFileOutputXML), we had a problem: java.lang.OutOfMemoryError: Java heap space.
We solved by changing the parameter: generation mode "Fast memory-consuming goal - Dom4J" replaced by "Slow with no memory Consumed"
Can you explain how this generation mode work?
Is there a maximum size of the XML file?
Thanks and Regards.
Moderator

Re: OutOfMemoryError: Java heap space

Hi,
Here is a component reference tAdvancedFileOutputXML.
For the "OutOfMemoryErrorJava heap space issue", there are also workaround outOfMemory and Allocating more memory to Talend Studio
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: OutOfMemoryError: Java heap space

Is there a maximum size of the XML file?
I'll have to deal with files becoming larger and I would anticipate a blocking
Best regards.
One Star

Re: OutOfMemoryError: Java heap space

Hi Noel,
in my experience, you will start having heap and memory problems like GC overhead limit exceeded with xml files larger than 700Mb even after hacking VM arguments and using SAX;
i had to deal with 1.2 up to 2Gb xml files as inputs, and the only way to do it was to split files into smaller one like 6Mb each and to a tfilelist to process them using the Dom4J parser.
user brazabr helped me with this useful code that i added as a routine and used it in a tjava componement.
    public static boolean split_file(String filename, int maxpart, String tagname, String roottag, String nsdeclaration){
FileOutputStream fout = null;
PrintStream outstream = null;
Scanner s = null;
int part=0;
int partsize=0;
boolean partnew=true;
String partfile, suffix, token;
partfile = filename.replaceFirst("\\.xml$", "");
try {
s = new Scanner(new FileInputStream(filename),"utf-8");
s.useDelimiter("</" + tagname + ">");
while (s.hasNext()) {
if(partnew){ //begin a new part file
suffix = String.format("_part%04d.xml",part);
fout = new FileOutputStream (partfile + suffix);
outstream = new PrintStream(fout);
if (part>0){ //insert leading tags
outstream.println("<?xml version=\"1.0\" encoding=\"utf-8\"?>");
outstream.println("<" + roottag + " " + nsdeclaration + ">");
}
partsize=0;
partnew=false;
}
//just append tokens
token = s.next();
outstream.print(token);
//if not last chunk append closing tag
if (token.indexOf("</" + roottag + ">")<0) outstream.println("</" + tagname + ">");
partsize += token.length();
if (partsize > maxpart) { //time to wrap it up
outstream.println("</" + roottag + ">");
outstream.close();
outstream = null;
fout.close();
fout = null;
part++;
partnew = true;
}
}
//dump the remaining part to out
outstream.close();
//fout.close();
return true;
} catch (Exception e) {
System.out.println(e.getMessage());
if (s != null) {
s.close();
}
if (outstream != null) {
outstream.close();
}
return false;
}
}

Re: OutOfMemoryError: Java heap space

Hello,
I have seen lot of people having issue with handling Huge XML files.
I have written a small post for working with Large XML files in Talend. Please visit the link below for more details:
http://www.vikramtakkar.com/2013/09/handling-huge-xml-files-in-talend.html
Let me know, if it helps.
Seventeen Stars

Re: OutOfMemoryError: Java heap space

hi all,
there is no determinated size to say that it will be out of heap (of number of java object).
I read a file about 5Go (with SAX) but only several elements.
So it could be empiric to know when a job crash due to too many java object.
BUT you have to optimize your job (use write on disk option for tmap, sort, ect) , avoid using live memory with buffer & hash component, read in several time, increase jvm params
my 2cents
regards
laurent