Four Stars

tWaitForFile Issue

Hello everyone,
we have a job in windows that is polling for a network folder for new and updated files.
tWaitForFile is used for triggering the job. See attached screenshots for configuration. We have an issue related the tWaitForFile component.
Typically we get a number of files arriving to the folder within relatively short period of time e.g. 9 files arriving within 5 minutes. What happens is that Talend start processing the first file. During the processing more files appear to the folder. At the end tWaitForFile is executed the right number of times (execution count = file update count) but some files are executed twice and again some files are not executed at all.

Somehow Talend mixes which files it has done and which are not. Do we have something wrong in our configuration or what could be the issue?
Many thanks in advance.

4 REPLIES
Four Stars

Re: tWaitForFile Issue

Could you please clarify few things...
- the folder would be empty initially?

if yes, that mean files are newly inserted.

if no, that means files are getting updated every time. would any file be updated twice?

 

you can try pushing/archiving the executed file in a separate folder using "((String)globalMap.get("tWaitForFile_1_FILENAME"))"  so it wont confuse between executed and unexecuted files. 

If the any file is updating twice then the second time it will be considered new as we have moved already executed files.

 

Let me know if this solves your issue.  

Twelve Stars

Re: tWaitForFile Issue

file level triggers (especial over network) it always source for issue

 

small example - series of ls -l command 

 

Screen Shot 2018-02-14 at 11.26.16 PM.png

 

as You can see - file already there ... but continue growing

what happens, if Job start read it when it still 0? File system have cache for file operations, network add one more layer ...

 

better way - put Talend Job on same server with files - still not 100% warrant from all collisions, plus adopt Your original process for write file to other folder, and rename it at the end. Rename operation - do not transfer data, just change link to file, so work much more faster and reduce collisions.

 

the best way - send files to Message Queue, big choice, but supported by all Talend Studios:

- Kafka

- ActiveMQ

 

it warrant mechanism similar with database transactions

-----------
Four Stars

Re: tWaitForFile Issue

Thanks for the quick reply.

I have been thinking about this time period when file is seen in folder but not yet complete. I agree that this definately can be an issue but I'm not fully convinced that would be the issue here. Basically there are 2 reasons

  • The number of executions is correct (equals the file count). If Talend would detect same file twice (first time when not yet complete and second time when complete) the number of executions would probably be higher than file count.
  • Our polling interval is 30 seconds. Our data file sizes vary from few kbs to few megs. We see this issues every night once the files are generated. I think it is unlikely to face the issue every time if it was due to file writing period which is short compared to 30 sec interval.

For Ajinkya_Gonnade's message: we have tried both ways clearing the folder before adding and just overwriting existing files.

 

Quick fix would be to build a custom loop that picks any file found from the folder, process it and move file to an archive folder afterwards then restarts. In our case where the frequency of file creation is low the loop could just pick any file without worrying that new incoming files would cause that one file is never processed.

But again I would of course like to use the standard compnents instead.

Four Stars

Re: tWaitForFile Issue

That's exactly what I mean by this... and loop is obvious for continuous iteration.

"you can try pushing/archiving the executed file in a separate folder using "((String)globalMap.get("tWaitForFile_1_FILENAME"))"  so it wont confuse between executed and unexecuted files. 

If the any file is updating twice then the second time it will be considered new as we have moved already executed files."

 

Also printing the current execution filename will let u know which files are executed and how many times.

Regards,

Ajinkya