processing large JSON file error: OutOfMemory

One Star

processing large JSON file error: OutOfMemory

Hi, I am new to Talend Open Studio for Data Integration. I was able to manage creating few test jobs successfully. Now I am running into an issue: when the tInputFileJSON reads a big file (80M), it will run out of memory. I have made some JVM changes to increase heap size, but still running into the same problem. As a matter of fact, I am not surprise by that as I can only increase heap size so much, but potential input data is "unlimited".
Just wonder how I can control the job to read say 5000 rows and process the data before bringing next 5000 row? what is the best practice to process large file in Talend?
Moderator

Re: processing large JSON file error: OutOfMemory

Hi,
In Talend, the input component such as tFileInputDelimited, tFileInputJson read all rows and cache them into memory, and then iterate one by one. So, it is impossible to process the first N rows, then next the N rows.
DO you have some other components like tMap, tFilterRow which consumes more memory in the job? For OutOfMemory exception, we are ususlly required to allocate more memory to the job,please see this KB article:
outOfMemory
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: processing large JSON file error: OutOfMemory

Thank you Sabrina! My JSON is pretty complicated and file size will keep increasing. So I don't think keeping allocating more memory is good for me in long run. Is there a way I can split a big file into multiple smaller files by the value of primary key? For example, I have a CSV with (device_id, billing_date, billing_amount), can I save all rows for device_id 111222 to file 111222.csv, and all rows for device_id 111333 to 111333.csv?
Moderator

Re: processing large JSON file error: OutOfMemory

Hi,
For csv file, the answer is yes, use tFileInputFullRow to read the source file row by row and generate several files with the option 'Split the files into several files' on tFileOutputDelimited.
The job looks like:
tFileInputFullRow--main--tFileOutputDelimited
See pics for details
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.