Six Stars

CSV file or Buffer memory, which is better to save mid data in the Job

Hi,

 

Which is the best method to store mid data in the job, whether it is in csv file or in buffer memory (hashoutput).

In my scenario, I am getting 4.4 Million records from source and I need to do some operation with this. So I am storing data in the mid of the job because my job contains multiple sub jobs.

 

I am considering multiple perspective like performance, storage space and there should have any memory issue etc.

Please suggest me the best method to use.

 

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Twelve Stars TRF
Twelve Stars

Re: CSV file or Buffer memory, which is better to save mid data in the Job

Hi,

Due to the number of records, having multiple intermediate files may help if you can parallelize the operations you need to realize with these records.

Else, having all the records in memory can generate memory issues but it depends most of the global data size than the number of records (are the records long or short?) and of course of the physical available memory.

Also, text (or CSV) file are processed very fast with standard tFileInputDelimited or tFileInputFullRow components, so you don't "really" have to worry about response time when using these components (in my opinion, except if you want to gain few seconds but I don't think this is the first concern in your case).

Hope this helps.


TRF
1 REPLY
Twelve Stars TRF
Twelve Stars

Re: CSV file or Buffer memory, which is better to save mid data in the Job

Hi,

Due to the number of records, having multiple intermediate files may help if you can parallelize the operations you need to realize with these records.

Else, having all the records in memory can generate memory issues but it depends most of the global data size than the number of records (are the records long or short?) and of course of the physical available memory.

Also, text (or CSV) file are processed very fast with standard tFileInputDelimited or tFileInputFullRow components, so you don't "really" have to worry about response time when using these components (in my opinion, except if you want to gain few seconds but I don't think this is the first concern in your case).

Hope this helps.


TRF