Four Stars

Memory Issue while performing joins on data



I am joining two files having 40 fields each with few millions of data, however it is throwing an error like below


java.lang.OutOfMemoryError: GC overhead limit exceeded


I tried saving temp data on some temporary location through tmap component. Also enabled JVM Arguments, still it is not able to process the required data. 


Please suggest.




Nine Stars

Re: Memory Issue while performing joins on data

Happened to me too, just get rid of all columns which are not necessary for the matching statement, join this additional data after you've finished matching... things will speed up. Especially if you have big string values (255 chars) that will drain performance! 


Twelve Stars

Re: Memory Issue while performing joins on data

as suggested already - You could try to reduce size of joined data


but in any case first of all You must calculate - what really memory size You need for fit in?

few millions, it not full description, file with few millions rows with 40 columns, should be easily 10-20-100Gb, and You need a lot of memory for join them.


Alternative solution - put both in database in indexed tables and do the JOIN in database (even if You need final result as csv file)

Databases much more oriented for work with huge data with limited memory resources.