The Definitive Guide to Data Quality
The argument you want to adjust for the heap is -Xmx, try increasing this to 1024m. MaxPermSpace of 128m should be fine.
Use trial-and-error to see if you can find a value that will get your map to run..
It looks like there's a custom CSV reader class that may be reading the entire input file in memory. That's fine for a moderate-sized document (100-200k), but not if the file is large. Can the custom CSV class be replaced with a tFileInputDelimited? That way, the input is processed line-by-line and the overall memory doesn't need to exceed that required for a single row.
The -Xmx argument in the .ini file controls the memory usage of the studio itself and thus no further than building a job. It makes no difference to the actual running of the job. The memory allocated to running a job is controlled by default through Window > Preferences > Talend > Run/Debug or for specific jobs under JVM arguments on the left side of the Run tab.
The job fails because StringBuilder object is trying to expand its backing array due to a very big record...
tFileInputDelimited with CSV options uses third party "com.csvreader.CsvReader" under the hood... so is possible that you are using it already... because is present in the stack trace...
you should post the job and example data if you want more optimization insights...
Could be you've described the data structure incorrectly.
How this kind of snippet could help? it is totally unrelated to the problem.
try to run the job in debug-trace mode in order to see the data flow... maybe the problem lies in data...
Can anyone suggest how I can find out what the maximum value I can set -Xmx to?
Watch the recorded webinar!
Create systems and workflow to manage clean data ingestion and data transformation.
Introduction to Talend Open Studio for Data Integration.
Test drive Talend's enterprise products.