how to relaunch the job without re-read the file ?

Seven Stars

how to relaunch the job without re-read the file ?

Hello

 

My title isn't very clear. 

Well, I have a job consists in :

1) reading a file in a directory

2) feed the database

3) Many tMap to check the "business rules" (in french : règles de gestion, i don't know to translate in english, sorry)  with :

              - If the data are correct, they are accepted and loaded in the database

              - If the data have an error, there are rejected and loaded in the database with an error message 

 

Actually, when there is an error, we must correct the error directly in the file and relaunch the job. (and delete the lines in the database before relaunch the job). 

 

But now, i have a new need : 

-No more to correct the error in the file BUT directly in the database.

But the matter, when i relaunch the job, Talend start all over again : Reread the file, reload the database, etc.

The correction of the error in the database is useless....

 

I don't know how to correct the error in the database and to start again where it stopped and not since the beginning.

 

(Ps : I am not the expert of Talend....Not yet....)

 

Thank you for your help!!!!

 

Fifteen Stars TRF
Fifteen Stars

Re: how to relaunch the job without re-read the file ?

If you want to ignore lines which has been loaded by previous run, store the current line number as a context variable (see tContextDump / tContextLoad) and use it in the tFileInputDelimited "header lines" parameter.

TRF
Seven Stars

Re: how to relaunch the job without re-read the file ?

Thank you, i don't know these components.

Community Manager

Re: how to relaunch the job without re-read the file ?

You need to keep a logging table for each step to do this. For example, the first part of the process is loading the data from a file. Once that is successfully completed for that filename (I assume the filename is unique....it kind of has to be), you can move to the second step, which I gather is validating the data. Once that is done, then you can load the data into your success table. So there are 3 steps. If you keep a table to log the success or failure of each step (maybe handled by an individual job?), then when your process fails, you can restart from the job(process) that did not finish successfully.

 

For example, your logging table might look something like this....

 

FileBatchProcessStartEndSuccess/Failure
File.xml1110:3110:32Success
File.xml1210:3210:33Failure
File.xml1210:3410:35Success
File.xml1310:3510:36Success
File2.xml2111:0011:01Success
File2.xml2211:0111:02Success
File2.xml2311:0211:03Failure
File2.xml2311:0311:04Success

 

You would drive the process from this table and have conditional logic in your job (assuming a Main job with 3 child jobs....one for each process). So when you restart, if the file has previously failed, it will restart the process that failed and continue from there. 

 

I use this sort of framework a lot.

 

Seven Stars

Re: how to relaunch the job without re-read the file ?

Thank you, your explanations are very clear. I will try to create a new child job

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

Agile Data lakes & Analytics

Accelerate your data lake projects with an agile approach

Watch