Hi, I am doing a Talend job which performs a lookup on a big table 4,724,567 I am getting the following output when I run the job: Exception in thread "main" java.lang.Error: java.lang.Error: java.lang.OutOfMemoryError: Java heap space.......................... Is there a way doing this job in batches and therefore not run out of memory? Thanks, Dave.
There is unfortunately a missing issue! In case of databases like MySQL or PostgreSQL you have to check if the input component use a cursor or in case of MySQL use the streaming mode! If you miss that options the JDBC driver collects all datasets in the memory until the first data set will be delivered to the application (here your talend job).
Writing to the disk seemed to fix that instance but I am still getting warnings: Warning: to avoid a Memory heap space error the buffer of the flow has been limited to a size of 281710 , try to reduce the advanced parameter "Max buffer size" (~100000 or at least less than 281710), then if needed try to increase the JVM Xmx parameter. Can someone explain how "to reduce the advanced parameter "Max buffer size"? Also if I "increase the JVM Xmx parameter" what do I increase it to? Thanks, Dave.
I'm a novice, but my understanding is that these settings just limit the amount of memory Java can use at runtime. You can set them to whatever you want, even more than the max memory on your machine. That would allow java to max out your system, not allowing anything else to use the memory while the java job is running. I ran into an issue with reading an excel file. My original settings were Xms256M & Xmx1024M. I decided not to play with incremental changes to find where it would work. I changed them to Xms256M & Xmx10G and my files processed just fine. I am using a 64 bit Windows 8.1 machine with 16G ram. I used 10G because I figured 5G for other systems in the short period I'm running these jobs was good enough.
Probably you dealing with a very large Excel file. You could try the component tFileExcelSheetInput or tFileExcelSheetOutput with tFileExcelWorkbookOpen. The last one is able to create a streaming excel workbook (called memory saving mode) which holds not all records of the excel file but only a smaller number (default is 100). This way you can process very large excel files with a small memory footprint. http://www.talendforge.org/exchange/index.php?eid=623&product=tos&action=view&nav=0,1,1 tFileExcelWorkbookOpen -> tFileExcelSheetInput ---> .... process your flow... I have introduced the memory saving mode because I run in the same issues you currently have ;-) The disadvantage of the memory saving mode is you cannot read randomly in the workbook, you have to process the rows when you receive them.