Four Stars

Split csv file in multiple files depending on number of rows it has

Hi,

I am passing tFileInputDelimited to tDataPrep component . In scenario where we have more than 70000 records, job goes in infinite loop and does not give any output. How Can I pass range of the records to tDataPrep in iteration rather than passing whole file in single shot. I have tried tSampleRow component which select rows with given range. I got succeeded in doing same In following  way,

tFileInputDelimited -> tsampleRow (With Range value as : "1,,1000") -> tDataPrep

again same with different rage

tFileInputDelimited -> tsampleRow (With Range value as : "1001,,2000") -> tDataPrep

I need to find-out algorithm to identify run time with range parameters and iterate this values to tSampleRow

 

I also add below code in tjava component.

 

int splitSize=10000;
int inputLimit=((Integer)globalMap.get("tFileRowCount_1_COUNT"));
int startPoint=1;
int endPoint=0;
int splitCount=splitSize;


while(splitCount<inputLimit)
{
startPoint=endPoint+1;
endPoint+=splitSize;
System.out.println(startPoint+" "+endPoint);
splitCount+=splitSize;
}
if(endPoint<inputLimit)
{
startPoint=endPoint+1;
endPoint=inputLimit;
System.out.println(startPoint+" "+endPoint);
}

 

this sample gives out put as for 32000 rows,

1           10000

10001   20000

20001   30000

30001   32000

 

Can anyone help me o identify to iterate in tjava component and pass these values to tSampleRow in iteration 

 

1 REPLY
Community Manager

Re: Split csv file in multiple files depending on number of rows it has

Hello
You can choose a range of data for each iteration by setting the header and limit parameter, for example:
tFileRowCount
-onsubjobok-
tLoop--iterate--tFileInputDelimited--main--tDataprep

on tLoop, set the From field as 0, To filed as ((Integer)globalMap.get("tFileRowCount_1_COUNT")),
and Step filed as 1000.

on tFileInputDelimited, set the Header as ((Integer)globalMap.get("tLoop_1_CURRENT_VALUE")) and Limit as 1000.

Regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business