Split csv file in multiple files depending on number of rows it has

Four Stars

Split csv file in multiple files depending on number of rows it has

Hi,

I am passing tFileInputDelimited to tDataPrep component . In scenario where we have more than 70000 records, job goes in infinite loop and does not give any output. How Can I pass range of the records to tDataPrep in iteration rather than passing whole file in single shot. I have tried tSampleRow component which select rows with given range. I got succeeded in doing same In following  way,

tFileInputDelimited -> tsampleRow (With Range value as : "1,,1000") -> tDataPrep

again same with different rage

tFileInputDelimited -> tsampleRow (With Range value as : "1001,,2000") -> tDataPrep

I need to find-out algorithm to identify run time with range parameters and iterate this values to tSampleRow

 

I also add below code in tjava component.

 

int splitSize=10000;
int inputLimit=((Integer)globalMap.get("tFileRowCount_1_COUNT"));
int startPoint=1;
int endPoint=0;
int splitCount=splitSize;


while(splitCount<inputLimit)
{
startPoint=endPoint+1;
endPoint+=splitSize;
System.out.println(startPoint+" "+endPoint);
splitCount+=splitSize;
}
if(endPoint<inputLimit)
{
startPoint=endPoint+1;
endPoint=inputLimit;
System.out.println(startPoint+" "+endPoint);
}

 

this sample gives out put as for 32000 rows,

1           10000

10001   20000

20001   30000

30001   32000

 

Can anyone help me o identify to iterate in tjava component and pass these values to tSampleRow in iteration 

 

Highlighted
Community Manager

Re: Split csv file in multiple files depending on number of rows it has

Hello
You can choose a range of data for each iteration by setting the header and limit parameter, for example:
tFileRowCount
-onsubjobok-
tLoop--iterate--tFileInputDelimited--main--tDataprep

on tLoop, set the From field as 0, To filed as ((Integer)globalMap.get("tFileRowCount_1_COUNT")),
and Step filed as 1000.

on tFileInputDelimited, set the Header as ((Integer)globalMap.get("tLoop_1_CURRENT_VALUE")) and Limit as 1000.

Regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Introduction to Talend Open Studio for Data Quality

Find out about Talend Open Studio for Data Quality

Watch Now

Enabling Data Governance

Learn how to enable Data Governance

Watch Now

The Definitive Guide to Government Data Quality

Take a peek at the definitive guide to Government Data Quality

Read