CSV file or Buffer memory, which is better to save mid data in the Job

Highlighted
Seven Stars

CSV file or Buffer memory, which is better to save mid data in the Job

Hi,

 

Which is the best method to store mid data in the job, whether it is in csv file or in buffer memory (hashoutput).

In my scenario, I am getting 4.4 Million records from source and I need to do some operation with this. So I am storing data in the mid of the job because my job contains multiple sub jobs.

 

I am considering multiple perspective like performance, storage space and there should have any memory issue etc.

Please suggest me the best method to use.

 

Thanks in advance.


Accepted Solutions
Fifteen Stars TRF
Fifteen Stars

Re: CSV file or Buffer memory, which is better to save mid data in the Job

Hi,

Due to the number of records, having multiple intermediate files may help if you can parallelize the operations you need to realize with these records.

Else, having all the records in memory can generate memory issues but it depends most of the global data size than the number of records (are the records long or short?) and of course of the physical available memory.

Also, text (or CSV) file are processed very fast with standard tFileInputDelimited or tFileInputFullRow components, so you don't "really" have to worry about response time when using these components (in my opinion, except if you want to gain few seconds but I don't think this is the first concern in your case).

Hope this helps.


TRF

All Replies
Fifteen Stars TRF
Fifteen Stars

Re: CSV file or Buffer memory, which is better to save mid data in the Job

Hi,

Due to the number of records, having multiple intermediate files may help if you can parallelize the operations you need to realize with these records.

Else, having all the records in memory can generate memory issues but it depends most of the global data size than the number of records (are the records long or short?) and of course of the physical available memory.

Also, text (or CSV) file are processed very fast with standard tFileInputDelimited or tFileInputFullRow components, so you don't "really" have to worry about response time when using these components (in my opinion, except if you want to gain few seconds but I don't think this is the first concern in your case).

Hope this helps.


TRF

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog