Six Stars

how to load multiplefiles into multiple tables and the have to load only the latest filedate only

Hi,

I have to load 5 files with  filenames example as indi_20101010052121.csv,

                                                                            can_20101010052121.csv,

like wise .i have created a job ltfileinputdelimted--->tmap--->tmysqloutput

 

can u please suggest how to load only the latest files .

 

Thank you

 

 

  • Data Integration
5 ACCEPTED SOLUTIONS

Accepted Solutions
Six Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

Hi,

Please follow the below steps, this might help.

1. tfilelist component and set the parameter Order by and Order action in such a way that latest file comes last.  

2. Take a iterate Link from tfilelist to tjava component.

3. Create context variable for the file name. Inside tjava set the context variable to the talend varibale ((String)globalMap.get("tFileList_1_CURRENT_FILE"));

4. Take out a Onsubjob ok from tfilelist to your existing tfileinputdelimited. use the context name for the file name.

 

Cheers!

Gatha

 

Six Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

here are the screenshots on how you can do

Full_Job.jpg

 

under tfileinputdelimited component, pls configure like below Instead F:/Gatha -pls use your file path.

Fileinputdelimted.jpg

 

In tjava component configure as below

 

jav_sTep.jpg

Six Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

Hi @k526,

 

I am not sure how the above screen shot works. 

 

from my solution  (Having a tfilelist, tjava and your old program)

 

1. In tfilelist you can specifiy the file mask. Please set a file mask as indi_*.csv

2. Please use a set of tfilelist and tjava component for each file you have with new file mask to filter the respective files.

 

Cheers!

Gatha

 

Ten Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

@k526 you need to use a tAggregateRow after the tSortRow and group by the filename prefix (before the datestamp) then select the FIRST function for all of the columns. Then add a tMap after the tAggregateRow and send the data for one filetype (by file prefix) in one direction and the other filetype in the other direction.

 

After your tMap you should connect to a tFlowToIterate, then connect that to your file. If you DB has the same schema for all files, you could parameterise the table name in the DB component and then for every file that is iterated through, you would be sending your data to a different DB table.

Rilhia Solutions
Moderator

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

Hello poulami15

 

There is no recursive function in tS3List component.
Have you tried to check out "List all buckets objects" option and enter the prefix of files to be listed? In this way, tS3list will list all the files on the S3 server.

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
14 REPLIES
Ten Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

It looks like your files contain a timestamp (yyyyMMddHHmmss). As such, you can order the files (tSortRow) by the filename (assuming that the file prefix indicates a different type and you want to retrieve the latest for each type). If you want to filter the files to 1 file (the latest) per type, use a tAggregateRow component, group by type (you will need to extract that using Java String utils) and return the First function for all of the columns (assuming you have sorted the data by type and date.

Rilhia Solutions
Six Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

tsortrow we can use for columns .but i need based upon the filename date the inputfiledelimeted has to load latest file to the database
Six Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

Hi,

Please follow the below steps, this might help.

1. tfilelist component and set the parameter Order by and Order action in such a way that latest file comes last.  

2. Take a iterate Link from tfilelist to tjava component.

3. Create context variable for the file name. Inside tjava set the context variable to the talend varibale ((String)globalMap.get("tFileList_1_CURRENT_FILE"));

4. Take out a Onsubjob ok from tfilelist to your existing tfileinputdelimited. use the context name for the file name.

 

Cheers!

Gatha

 

Six Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

I dont have idea

3.Create context variable for the file name. Inside tjava set the context variable to the talend variable ((String)globalMap.get("tFileList_1_CURRENT_FILE"));
4. Take out a Onsubjob ok from tfilelist to your existing tfileinputdelimited. use the context name for the file name.

can you please help me in this?
Six Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

here are the screenshots on how you can do

Full_Job.jpg

 

under tfileinputdelimited component, pls configure like below Instead F:/Gatha -pls use your file path.

Fileinputdelimted.jpg

 

In tjava component configure as below

 

jav_sTep.jpg

Ten Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

Did you actually test this before accepting it @k526? Given your description, it won't work. This will ONLY run the latest file (singular). Given your example files....

indi_20101010052121.csv

can_20101010052121.csv

 

.....only one file would be loaded, yet according to the timestamp they should potentially both be loaded. I assume that the time in the filename is what you want to sort by and not the actual date of the file. 

 

The solution I suggested assumed an understanding that you would have to use the tFileList and plug it into a tIterateToFlow. This would then allow you to follow the rest of what I suggested.

Rilhia Solutions
Six Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

thanks for ur suggestion.I have tried what u suggested it was working .and i have found one problem that all the 5 files are loaded into a single table like wise 5 files are loaded 5 times in every table.I need solution like for

 

EX:Can file has to load only in can table,

ind file has to load only in ind table.

 

 

Six Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

HI,

 

Can you please post a screenshot of your present program?

Six Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

new.PNG

Six Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

Hi @k526,

 

I am not sure how the above screen shot works. 

 

from my solution  (Having a tfilelist, tjava and your old program)

 

1. In tfilelist you can specifiy the file mask. Please set a file mask as indi_*.csv

2. Please use a set of tfilelist and tjava component for each file you have with new file mask to filter the respective files.

 

Cheers!

Gatha

 

Ten Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

@k526 you need to use a tAggregateRow after the tSortRow and group by the filename prefix (before the datestamp) then select the FIRST function for all of the columns. Then add a tMap after the tAggregateRow and send the data for one filetype (by file prefix) in one direction and the other filetype in the other direction.

 

After your tMap you should connect to a tFlowToIterate, then connect that to your file. If you DB has the same schema for all files, you could parameterise the table name in the DB component and then for every file that is iterated through, you would be sending your data to a different DB table.

Rilhia Solutions
Six Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

Thank you so much gatha.this is working fine
Five Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

hey,

 

same scenario.. only difference is my file is on AWS S3. i need to load files from S3 to table by ordering them on file timestamp.. i.e., old file timestamp will be loaded first.. then the next one will be loaded in the same table..

 

concern is, ts3list have no option for ordering. and i can not use tfilelist as tfilelist only works on premise level.

 please help.

Moderator

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

Hello poulami15

 

There is no recursive function in tS3List component.
Have you tried to check out "List all buckets objects" option and enter the prefix of files to be listed? In this way, tS3list will list all the files on the S3 server.

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.