How to get multiple excel files after extracting most recent to tflowtoiterate

Four Stars

How to get multiple excel files after extracting most recent to tflowtoiterate

I have googled to death on this and am not sure I can do this with talend, or if I am not asking about the correct components.  My talend job goes to a folder with numerous dated excel files and pulls the most recent two.  I was hoping I could pull them both into the same postgres table via tpostgresqloutput.  However, the tfileinputexcel component I am using, I am using ((String)globalMap.get("MostRecent")) and therefore pulls only one.  Is there a way to pull both files?  currently here are components:

tpostgresqlconnection trigger ok to tfilelist1 to folder with multiple excel iterate to titeratetoflow_2 main to tbufferoutput_1 main to tlogrow_2. tbufferinput_1 also triggered by tfilelist_1 main textractregexfeilds_1 main tsortrow_1 main tlogrow3 main tsamplerow_1 with "1..2" main tlogrow-4 main tflowtoiterate_1 with basic settings as key "MostRecent"  value = CURRENT_FILEPATH.  Second row in customize box is key = "LastFileName" and value = "CURRENT_FILE".  I am sure this code can be changed, but I'm not a java person so really have no clue.  tried to take snapshot of talend job but could not paste in.

 

Thank you for your help.

 

 

Sixteen Stars TRF
Sixteen Stars

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

Can you share a capture of the job design?
Not sure to understand your case. Of course you can load 2 (or more) Excel file in your db, one at a time.
So you need to iterate over the files (using tFileList) then read the content (using tFileInputExcel) and finally push the content into the db (using tPostgresqlOutput).
If you're able to get the name of the 2 files you want to load, then use 2 global variable to store these names and as filter in tFileList.

TRF
Four Stars

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

I have attached a word document that shows the entire talend job, and then the code in tflowtoiterate and tfileinputexcel. I think one or both of these need to have the code changed so this will pull the two files flowed to the tflowtoiterate_1 component. All the previous components are working properly to pull the two most recent files. The tfileList_1 just has the beginning file name "smartgfe*" and then extracts the two most current. I really appreciate your help.


Community Manager

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

What @TRF has said is a good solution. 

Using the tFileList you can order the files returned by "modified date".  This enables you to (with a little experimentation of the feature) retrieve a list of filenames in date order. Connect the tFileList to a tJava component with the following code....

 

int count = 0;

//Checks to see if "count" exists.
if(globalMap.get("count")!=null){
     //Adds 1 to the current "count" value
     count = ((Integer)globalMap.get("count")).intValue() +1;
}else{
    //"creates the first "count" value
    count = 1;
}

//Update or set the globalMap value
globalMap.put("count", count);    

Now connect your tJava to the tFileInputExcel using a RunIf link. In the RunIf expression use the following logic....

 

((Integer)globalMap.get("count")).intValue()<=2

Then (if the rest of your job is configured OK) you should be able to limit the file read to the most recent two files. 

Four Stars

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

This sounds excellent with the java. However, I am not sure how to work in the tfilelist, being that a human will be saving and naming these files, the modified date may be the same for files that are not of the same month. Therefore, I would like to stick to my textractregexfields (see attached). This works, pulling the two current files I want. Would I connect the tjava component to tbufferoutput? Thanks for the excellent code, btw.


Community Manager

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

OK, it sounds like you may have a different method that doesn't require my method....but I cannot see your attachment I'm afraid. You should be able to insert an image of your job by clicking on the "photos" button here and selecting yoru screenshot

Four Stars

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

Thanks to rhall for telling me how to upload pix.  You are probably thinking by now I should be looking for a new career.  here is the job in total, and then the two components I was thinking needed different code.talend job to extract multiple excel files starr cruise.jpgtfileinputexcel.jpgtflowtoiterate component.jpg

Community Manager

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

Ah, I see. If you have already narrowed the jobs down to the two you want by the time you get to the tFlowToIterate, you simply need to connect the tFlowToIterate to the Excel component using an iterate link. This will allow you to run the last subjob one time for every version of the file (in this case, 2 versions) 

Four Stars

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

I made what I thought was correct change, but though it runs, it only captures last file.  What did I do wrong?  do I need to change the code in the tfileinputexcel?  Thanks for your patience.  This is really important as I have other talend jobs like this I need to do the same.revised talend job.jpg

Ten Stars

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

The filename field of your Excel input should be the filename variable from tFlowToIterate. (Expand the entry for tFlowToIterate_1 in the Outline area in the bottom left and you can drag and drop the value directly into the component settings.)
Community Manager

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

As @cterenzi suggests, it is likely the variable you are using for the Excel component. Can you show us how your Excel component is configured?

Four Stars

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

Hi cterenzi,

Thanks for the new education on the outline.  I am so happy to have learned that.  However, I did drag in the "current_filepath which produced the exact code I had originally and only pulled in one file.  For the heck of it, I drug in current_file, AND Current iteration of line just to see if you meant those.  They error out.  here is screen shot with current_filepath drug over which works but is exactly what I had and only pulls onetinputexcelfile file name.stream by drag and drop.jpg

Four Stars

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

you are correct.  It replicated what I was already using.  I just sent it before you replied.

Four Stars

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

Hi 

 

 I am new to Talend , I  have similar kind of requirements to read excel files from SFTP server ,

I use this workflow to get the server files ,It is getting all files from sftp server successfully . 

 

   tFTPconnection -- on component-OK -- tFTPfilelist_1 --tIterateToFlow_1-- tFileoutputExcel---tLogRow

                                                                      |

                                                        tjava_1 -- tFileInputExcel_1--tfilterrow_1 --tmap_1---PostgresqlOutPut

 

But I am getting this error while I am ready from excel file .

 

 tFileInputExel_1 Filename has ((String)globalMap.get("tFTPFileList_1_CURRENT_FILE")) 

   Exception in component tFileInputExcel_1 (Kite_sftp_connection)
java.io.FileNotFoundException: RE_check_register.xlsx (The system cannot find the file specified)

 

(I have tried *FILEPATH)  but showing same error message .

Can you please suggest , how to solve this issue . 

 

Thanks you.

 

 

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 2

Part 2 of a series on Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog