how to load multiplefiles into multiple tables and the have to load only the latest filedate only

Highlighted
Six Stars

how to load multiplefiles into multiple tables and the have to load only the latest filedate only

Hi,

I have to load 5 files with  filenames example as indi_20101010052121.csv,

                                                                            can_20101010052121.csv,

like wise .i have created a job ltfileinputdelimted--->tmap--->tmysqloutput

 

can u please suggest how to load only the latest files .

 

Thank you

 

 


Accepted Solutions
Highlighted
Seven Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

Hi,

Please follow the below steps, this might help.

1. tfilelist component and set the parameter Order by and Order action in such a way that latest file comes last.  

2. Take a iterate Link from tfilelist to tjava component.

3. Create context variable for the file name. Inside tjava set the context variable to the talend varibale ((String)globalMap.get("tFileList_1_CURRENT_FILE"));

4. Take out a Onsubjob ok from tfilelist to your existing tfileinputdelimited. use the context name for the file name.

 

Cheers!

Gatha

 

View solution in original post

Highlighted
Seven Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

here are the screenshots on how you can do

Full_Job.jpg

 

under tfileinputdelimited component, pls configure like below Instead F:/Gatha -pls use your file path.

Fileinputdelimted.jpg

 

In tjava component configure as below

 

jav_sTep.jpg

View solution in original post

Highlighted
Seven Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

Hi @k526,

 

I am not sure how the above screen shot works. 

 

from my solution  (Having a tfilelist, tjava and your old program)

 

1. In tfilelist you can specifiy the file mask. Please set a file mask as indi_*.csv

2. Please use a set of tfilelist and tjava component for each file you have with new file mask to filter the respective files.

 

Cheers!

Gatha

 

View solution in original post

Highlighted
Community Manager

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

@k526 you need to use a tAggregateRow after the tSortRow and group by the filename prefix (before the datestamp) then select the FIRST function for all of the columns. Then add a tMap after the tAggregateRow and send the data for one filetype (by file prefix) in one direction and the other filetype in the other direction.

 

After your tMap you should connect to a tFlowToIterate, then connect that to your file. If you DB has the same schema for all files, you could parameterise the table name in the DB component and then for every file that is iterated through, you would be sending your data to a different DB table.

View solution in original post

Highlighted
Moderator

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

Hello poulami15

 

There is no recursive function in tS3List component.
Have you tried to check out "List all buckets objects" option and enter the prefix of files to be listed? In this way, tS3list will list all the files on the S3 server.

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.

View solution in original post


All Replies
Highlighted
Community Manager

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

It looks like your files contain a timestamp (yyyyMMddHHmmss). As such, you can order the files (tSortRow) by the filename (assuming that the file prefix indicates a different type and you want to retrieve the latest for each type). If you want to filter the files to 1 file (the latest) per type, use a tAggregateRow component, group by type (you will need to extract that using Java String utils) and return the First function for all of the columns (assuming you have sorted the data by type and date.

Highlighted
Six Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

tsortrow we can use for columns .but i need based upon the filename date the inputfiledelimeted has to load latest file to the database
Highlighted
Seven Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

Hi,

Please follow the below steps, this might help.

1. tfilelist component and set the parameter Order by and Order action in such a way that latest file comes last.  

2. Take a iterate Link from tfilelist to tjava component.

3. Create context variable for the file name. Inside tjava set the context variable to the talend varibale ((String)globalMap.get("tFileList_1_CURRENT_FILE"));

4. Take out a Onsubjob ok from tfilelist to your existing tfileinputdelimited. use the context name for the file name.

 

Cheers!

Gatha

 

View solution in original post

Highlighted
Six Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

I dont have idea

3.Create context variable for the file name. Inside tjava set the context variable to the talend variable ((String)globalMap.get("tFileList_1_CURRENT_FILE"));
4. Take out a Onsubjob ok from tfilelist to your existing tfileinputdelimited. use the context name for the file name.

can you please help me in this?
Highlighted
Seven Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

here are the screenshots on how you can do

Full_Job.jpg

 

under tfileinputdelimited component, pls configure like below Instead F:/Gatha -pls use your file path.

Fileinputdelimted.jpg

 

In tjava component configure as below

 

jav_sTep.jpg

View solution in original post

Highlighted
Community Manager

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

Did you actually test this before accepting it @k526? Given your description, it won't work. This will ONLY run the latest file (singular). Given your example files....

indi_20101010052121.csv

can_20101010052121.csv

 

.....only one file would be loaded, yet according to the timestamp they should potentially both be loaded. I assume that the time in the filename is what you want to sort by and not the actual date of the file. 

 

The solution I suggested assumed an understanding that you would have to use the tFileList and plug it into a tIterateToFlow. This would then allow you to follow the rest of what I suggested.

Highlighted
Six Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

thanks for ur suggestion.I have tried what u suggested it was working .and i have found one problem that all the 5 files are loaded into a single table like wise 5 files are loaded 5 times in every table.I need solution like for

 

EX:Can file has to load only in can table,

ind file has to load only in ind table.

 

 

Highlighted
Seven Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

HI,

 

Can you please post a screenshot of your present program?

Highlighted
Six Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

new.PNG

Highlighted
Seven Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

Hi @k526,

 

I am not sure how the above screen shot works. 

 

from my solution  (Having a tfilelist, tjava and your old program)

 

1. In tfilelist you can specifiy the file mask. Please set a file mask as indi_*.csv

2. Please use a set of tfilelist and tjava component for each file you have with new file mask to filter the respective files.

 

Cheers!

Gatha

 

View solution in original post

Highlighted
Community Manager

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

@k526 you need to use a tAggregateRow after the tSortRow and group by the filename prefix (before the datestamp) then select the FIRST function for all of the columns. Then add a tMap after the tAggregateRow and send the data for one filetype (by file prefix) in one direction and the other filetype in the other direction.

 

After your tMap you should connect to a tFlowToIterate, then connect that to your file. If you DB has the same schema for all files, you could parameterise the table name in the DB component and then for every file that is iterated through, you would be sending your data to a different DB table.

View solution in original post

Highlighted
Six Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

Thank you so much gatha.this is working fine
Highlighted
Five Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

hey,

 

same scenario.. only difference is my file is on AWS S3. i need to load files from S3 to table by ordering them on file timestamp.. i.e., old file timestamp will be loaded first.. then the next one will be loaded in the same table..

 

concern is, ts3list have no option for ordering. and i can not use tfilelist as tfilelist only works on premise level.

 please help.

Highlighted
Moderator

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

Hello poulami15

 

There is no recursive function in tS3List component.
Have you tried to check out "List all buckets objects" option and enter the prefix of files to be listed? In this way, tS3list will list all the files on the S3 server.

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.

View solution in original post

Highlighted
Four Stars

Re: how to load multiplefiles into multiple tables and the have to load only the latest filedate only

@gatha_vdm  Thanks for the quick response. I will try the given steps to get the data in multiple output tables.

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog