Four Stars

How to process MongoDB data (Approx 1.3 millions records) in Talend in chunks

Hi,

 

We need to process Mongo DB data [Approx 1.3 millions) in Talend. When we are trying to process all the records in one go, it's giving "Heap space - out of memory" error in Talend. We have tried all the ways to increase the JVM memory size but it's not working out. Probably because we have complex logic in worflow.

 

So now we are looking to process data in chunks but not sure how can it be done in Talend. Currently we are pulling MongoDB data using "MongoDbInput" component.

 

Could anyone please advise on this. 

 

Thanks in advance!!

 

Regards,

Pragya

 

 

1 REPLY
Six Stars

Re: How to process MongoDB data (Approx 1.3 millions records) in Talend in chunks

You have multiple solutions possible.

 

First you can try to reduce the number of column you pull in Talend in the tinput component. Revome unused or unwanted column if it's possible.

 

Secondly :Increase the buffer size of the tmap. In order to do that go in the top left hand corner inside you tMap and increase the number.

 

Thirdly : You can try to store on disk, the data (the data are in the memory by default). This is less performant but it can work. In order to do that go in the top left hand corner inside you tMap like the point before and choose a directory.