Four Stars

How to get multiple excel files after extracting most recent to tflowtoiterate

I have googled to death on this and am not sure I can do this with talend, or if I am not asking about the correct components.  My talend job goes to a folder with numerous dated excel files and pulls the most recent two.  I was hoping I could pull them both into the same postgres table via tpostgresqloutput.  However, the tfileinputexcel component I am using, I am using ((String)globalMap.get("MostRecent")) and therefore pulls only one.  Is there a way to pull both files?  currently here are components:

tpostgresqlconnection trigger ok to tfilelist1 to folder with multiple excel iterate to titeratetoflow_2 main to tbufferoutput_1 main to tlogrow_2. tbufferinput_1 also triggered by tfilelist_1 main textractregexfeilds_1 main tsortrow_1 main tlogrow3 main tsamplerow_1 with "1..2" main tlogrow-4 main tflowtoiterate_1 with basic settings as key "MostRecent"  value = CURRENT_FILEPATH.  Second row in customize box is key = "LastFileName" and value = "CURRENT_FILE".  I am sure this code can be changed, but I'm not a java person so really have no clue.  tried to take snapshot of talend job but could not paste in.

 

Thank you for your help.

 

 

  • Data Integration
12 REPLIES
Eleven Stars TRF
Eleven Stars

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

Can you share a capture of the job design?
Not sure to understand your case. Of course you can load 2 (or more) Excel file in your db, one at a time.
So you need to iterate over the files (using tFileList) then read the content (using tFileInputExcel) and finally push the content into the db (using tPostgresqlOutput).
If you're able to get the name of the 2 files you want to load, then use 2 global variable to store these names and as filter in tFileList.

TRF
Four Stars

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

I have attached a word document that shows the entire talend job, and then the code in tflowtoiterate and tfileinputexcel. I think one or both of these need to have the code changed so this will pull the two files flowed to the tflowtoiterate_1 component. All the previous components are working properly to pull the two most recent files. The tfileList_1 just has the beginning file name "smartgfe*" and then extracts the two most current. I really appreciate your help.


Eleven Stars

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

What @TRF has said is a good solution. 

Using the tFileList you can order the files returned by "modified date".  This enables you to (with a little experimentation of the feature) retrieve a list of filenames in date order. Connect the tFileList to a tJava component with the following code....

 

int count = 0;

//Checks to see if "count" exists.
if(globalMap.get("count")!=null){
     //Adds 1 to the current "count" value
     count = ((Integer)globalMap.get("count")).intValue() +1;
}else{
    //"creates the first "count" value
    count = 1;
}

//Update or set the globalMap value
globalMap.put("count", count);    

Now connect your tJava to the tFileInputExcel using a RunIf link. In the RunIf expression use the following logic....

 

((Integer)globalMap.get("count")).intValue()<=2

Then (if the rest of your job is configured OK) you should be able to limit the file read to the most recent two files. 

Rilhia Solutions
Four Stars

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

This sounds excellent with the java. However, I am not sure how to work in the tfilelist, being that a human will be saving and naming these files, the modified date may be the same for files that are not of the same month. Therefore, I would like to stick to my textractregexfields (see attached). This works, pulling the two current files I want. Would I connect the tjava component to tbufferoutput? Thanks for the excellent code, btw.


Eleven Stars

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

OK, it sounds like you may have a different method that doesn't require my method....but I cannot see your attachment I'm afraid. You should be able to insert an image of your job by clicking on the "photos" button here and selecting yoru screenshot

Rilhia Solutions
Four Stars

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

Thanks to rhall for telling me how to upload pix.  You are probably thinking by now I should be looking for a new career.  here is the job in total, and then the two components I was thinking needed different code.talend job to extract multiple excel files starr cruise.jpgtfileinputexcel.jpgtflowtoiterate component.jpg

Eleven Stars

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

Ah, I see. If you have already narrowed the jobs down to the two you want by the time you get to the tFlowToIterate, you simply need to connect the tFlowToIterate to the Excel component using an iterate link. This will allow you to run the last subjob one time for every version of the file (in this case, 2 versions) 

Rilhia Solutions
Four Stars

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

I made what I thought was correct change, but though it runs, it only captures last file.  What did I do wrong?  do I need to change the code in the tfileinputexcel?  Thanks for your patience.  This is really important as I have other talend jobs like this I need to do the same.revised talend job.jpg

Ten Stars

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

The filename field of your Excel input should be the filename variable from tFlowToIterate. (Expand the entry for tFlowToIterate_1 in the Outline area in the bottom left and you can drag and drop the value directly into the component settings.)
Eleven Stars

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

As @cterenzi suggests, it is likely the variable you are using for the Excel component. Can you show us how your Excel component is configured?

Rilhia Solutions
Four Stars

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

Hi cterenzi,

Thanks for the new education on the outline.  I am so happy to have learned that.  However, I did drag in the "current_filepath which produced the exact code I had originally and only pulled in one file.  For the heck of it, I drug in current_file, AND Current iteration of line just to see if you meant those.  They error out.  here is screen shot with current_filepath drug over which works but is exactly what I had and only pulls onetinputexcelfile file name.stream by drag and drop.jpg

Four Stars

Re: How to get multiple excel files after extracting most recent to tflowtoiterate

you are correct.  It replicated what I was already using.  I just sent it before you replied.