Five Stars

files lock

Hi,

I have several jobs wich process files from a shared directory.

What's the best practice to lock files and prevent that more than one job/task/thread read the same file.

Thanks for help

  • Data Integration
Tags (1)
1 ACCEPTED SOLUTION

Accepted Solutions
Eleven Stars

Re: files lock

There is, but it requires a reasonably complicated (if you are not familiar with web services) bit of work. I use a solution very similar to this. What you can do is use the Talend Metaservlet to create and run tasks (Talend jobs) via another job. So you could set up your job to go through the files and use that job to call the metaservlet to start other Talend jobs in the TAC. The metaservlet documentation can be found here (https://help.talend.com/reader/rJGzSCBb8MvnaZHhs978KQ/PMoHeNdt5qac07VehVViDA). This is incredibly powerful and really adds another level onto what you can do with Talend. However it requires a fair amount of getting used to and is not the most straight forward of technologies to use if you are not familiar with web services.

 

I have a solution working where On a file arriving in a folder, I look at the filename and the folder path, identify the job that needs to be run from that (looking it up in a database) and then starting the appropriate job (if it is configured in the TAC) or creating the task in the TAC and then starting it. It deals with load balancing, dynamic updating of jobs (from Nexus) and allows us to introduce new functionality really quickly. However, it took a while to perfect and we use the Talend ESB here to further support this.

 

Give it a try by setting up a job to call a service which will fire the "RunTask" command. If you can get round that pretty quickly, the rest is fairly logical.

Rilhia Solutions
10 REPLIES
Eleven Stars

Re: files lock

There are several ways you can look into achieving this. It very much depends on how you are using the files with the jobs. For example, do all of the jobs need to use every file in a sequence? Is it a first come first served system where only one job can process a file. Do some files need to be processed by some of the Talend  jobs, but not all of them? If you can answer the above, it might make it easier to make some suggestions.

Rilhia Solutions
Five Stars

Re: files lock

Hi

We are in the scenario where each file needs to be handled once.

In fact the jobs are identic, they are several instances of the same job just to parallalize the process.

Thanks

Eleven Stars

Re: files lock

One relatively easy solution to this might be to set the job up to move the file to another location to process it. This location should use a context variable which can be set differently per job. Once the file is moved for processing, no other job will be able to pick it up.

Rilhia Solutions
Five Stars

Re: files lock

While moving the file no other job could use it ?

Is there any other solution ? the files size is big
Eleven Stars

Re: files lock

That was just a very simple solution which probably wouldn't be that effective for large files which might take a while to move. Can you tell me how you are starting the jobs that consume these files? What mechanism are you using?

 

I have a similar situation here, but I use Talend ESB to handle the requirement. However that solution is quite complex and has to handle all sorts of situations.  

Rilhia Solutions
Five Stars

Re: files lock

The jobs are starting by loading the context then tfileList then check if the file has been processed then processing the file (some transformations & data loading into database)
Eleven Stars

Re: files lock

Sorry, what I meant was are you running this using the Talend Administration Console or are you just starting the jobs on the command line or inside several Studios? 

 

As an example, if you are running the jobs on the command at the same time, why not create a job with a tFileList which will find the files only once and then use that job to start the other jobs and pass them the file credentials. So remove the tFileList from the others jobs and only have that running in 1.

Rilhia Solutions
Five Stars

Re: files lock

Hi,

We are using Talend Administration Center.

Is it possible to parallalize the solution that you've proposed.

Kindely

Eleven Stars

Re: files lock

There is, but it requires a reasonably complicated (if you are not familiar with web services) bit of work. I use a solution very similar to this. What you can do is use the Talend Metaservlet to create and run tasks (Talend jobs) via another job. So you could set up your job to go through the files and use that job to call the metaservlet to start other Talend jobs in the TAC. The metaservlet documentation can be found here (https://help.talend.com/reader/rJGzSCBb8MvnaZHhs978KQ/PMoHeNdt5qac07VehVViDA). This is incredibly powerful and really adds another level onto what you can do with Talend. However it requires a fair amount of getting used to and is not the most straight forward of technologies to use if you are not familiar with web services.

 

I have a solution working where On a file arriving in a folder, I look at the filename and the folder path, identify the job that needs to be run from that (looking it up in a database) and then starting the appropriate job (if it is configured in the TAC) or creating the task in the TAC and then starting it. It deals with load balancing, dynamic updating of jobs (from Nexus) and allows us to introduce new functionality really quickly. However, it took a while to perfect and we use the Talend ESB here to further support this.

 

Give it a try by setting up a job to call a service which will fire the "RunTask" command. If you can get round that pretty quickly, the rest is fairly logical.

Rilhia Solutions
Five Stars

Re: files lock

Thanks a lot for your help