[resolved] Using tFileList and tWaitForFile both to power the same process...

One Star

[resolved] Using tFileList and tWaitForFile both to power the same process...

I am just learning Talend, and am a little confused about how to go about this...
I have 2 directories:
FilesToProcess
ProcessedFiles
When Talend first starts up, I want to use tFileList to kick off... so we process any files that have arrived while the job wasn't running.
Then we switch to tWaitForFile for files that arrive while we are up and running...
I have thought of a few hypothetical ways to do this, but they all seem to have problems...
For example:
Use a 'MasterJob' this MasterJob kicks off childJob_tFileList then kicks of childJob_tWaitForFile both of these in turn call childJob_ProcessFile
The problem with this idea, is that if a new file appears during or after childJob_tFileList but before childJob_tWaitForFile, it will not be processed by anyone... until I restart the job that is.
If I could run both tFileList and tWaitForFile in parallel, that would solve this problem, but I'm not sure how to do that.
Another solution would be to periodically run tFileList as a 'clean-up' routine, and process as needed.
There's a million hypothetical solutions, what I'm most interested in, is what an experienced Talend developer would do.
Points that I'm most interested in (confused about):
1. When in this process are sub-jobs recomended?
2. If I want to feed 2 connections main/iterate into a component that only allows one input, is a subjob a good solution?
3. Is it possible / safe to run to Iterate components at the same time (for example the global fileName variable may be being written/read from at the same time by 2 different threads....
I've searched many pages of documentation, but have found nothing about how Talend handles concurrency, or about accessing variables in a parent job. There is also very little in the way of recommendations as to when to use iterate vs. main links.
I'd appreciate both poinoters on my specific use case, as well as pointers to any documentation that I may have missed to help clarify these points.
Thanks for your help,
-Eric

Accepted Solutions
Community Manager

Re: [resolved] Using tFileList and tWaitForFile both to power the same process...

Hello Eric
If I could run both tFileList and tWaitForFile in parallel, that would solve this problem,
3. Is it possible / safe to run to Iterate components at the same time (for example the global fileName variable may be being written/read from at the same time by 2 different threads....

Yes, running the two subjobs parallelly could be a solution, but some of files maybe be processed two time by two subJobs, so we need check the file if have beend processed first. As my screenshots show, I use a tJava to do that first.
code on tJava_1
context.condtion1=false;
String fileName=(String)globalMap.get("tWaitForFile_1_FILENAME");
if(globalMap.get(fileName)==null){
context.condtion1=true;
globalMap.put(fileName,fileName);
}else{
context.condtion1=false;
}

code on tJava_2
context.condtion2=false;
String fileName=((String)globalMap.get("tFileList_1_CURRENT_FILE"));
if(globalMap.get(fileName)==null){
context.condtion2=true;
globalMap.put(fileName,fileName);
}else{
context.condtion2=false;
}

There is also very little in the way of recommendations as to when to use iterate vs. main links.

In short, main link is a flow, it transfer records to next component. Iterate link fires the next subJob base on the number of records, for example:
tFileList--iterate--tJava
if there are 3 files in the directory monitored by tFileList, tJava will be fired 3 time, that means tJava run 3 time.
Best regards

shong
----------------------------------------------------------
Talend | Data Agility for Modern Business

All Replies
Community Manager

Re: [resolved] Using tFileList and tWaitForFile both to power the same process...

Hello Eric
If I could run both tFileList and tWaitForFile in parallel, that would solve this problem,
3. Is it possible / safe to run to Iterate components at the same time (for example the global fileName variable may be being written/read from at the same time by 2 different threads....

Yes, running the two subjobs parallelly could be a solution, but some of files maybe be processed two time by two subJobs, so we need check the file if have beend processed first. As my screenshots show, I use a tJava to do that first.
code on tJava_1
context.condtion1=false;
String fileName=(String)globalMap.get("tWaitForFile_1_FILENAME");
if(globalMap.get(fileName)==null){
context.condtion1=true;
globalMap.put(fileName,fileName);
}else{
context.condtion1=false;
}

code on tJava_2
context.condtion2=false;
String fileName=((String)globalMap.get("tFileList_1_CURRENT_FILE"));
if(globalMap.get(fileName)==null){
context.condtion2=true;
globalMap.put(fileName,fileName);
}else{
context.condtion2=false;
}

There is also very little in the way of recommendations as to when to use iterate vs. main links.

In short, main link is a flow, it transfer records to next component. Iterate link fires the next subJob base on the number of records, for example:
tFileList--iterate--tJava
if there are 3 files in the directory monitored by tFileList, tJava will be fired 3 time, that means tJava run 3 time.
Best regards

shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: [resolved] Using tFileList and tWaitForFile both to power the same process...

Shong,
Thank you very much. Your reply was very, very helpful.
There are two things here I did not see in the docs:
1. Multi-Threaded execution
2. Condition on (if) link between components.
Are these things documented somewhere? I understand them now, but I think there may be many more features that I have not found the docs for... I think there is a whole section I am missing.
Thanks,
-Eric
Community Manager

Re: [resolved] Using tFileList and tWaitForFile both to power the same process...

Hello
but I think there may be many more features that I have not found the docs for... I think there is a whole section I am missing.

Yes, the documentation is always delay than development, as there are so many developpers than writers. If you join in our forum, you will always learn some userful hint and skills. Smiley Wink
Best regards

shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: [resolved] Using tFileList and tWaitForFile both to power the same process...

Hello ,
I have a scenario in which i am taking multiple files from folder and loading it into a table at one shot and recording logs for it at the end of job.but i want to modify it and make it in such a way that 1 file should be picked up from folder ,then it should be loaded and log should be recorded for that particular file ,then same for the next file.....in such a way by seeing the log table i should be able to note which all files are loaded properly ...
any suggestions.......
Four Stars

Re: [resolved] Using tFileList and tWaitForFile both to power the same process...

Hi
Please open a new topic for your issue, when you login into your account under your home page go to "configuration,usage and feed back" there you can find "post new topic" option.

thanks
Anil Kumar Burri
One Star

Re: [resolved] Using tFileList and tWaitForFile both to power the same process...

Sorry posted in the wrong place.
One Star

Re: [resolved] Using tFileList and tWaitForFile both to power the same process...

Hi shong,
For the job u created if i create one main job in which i use trunjob components twice and call your job as a child job twice , since its monitoring the same folder wont this two child read same files and process it twice how to check file is processed or not with respect to job scenario i have said .. Any solutions to have file check