One Star

To filter one row from a set of rows based on a condition


 

I have a set of rows in a table X as

job1   20170506 Sat   Not Run
job1   20170507 Sun   Completed
job1   20170508 Mon   Not Run
job1   20170513 Sat   Completed
job1   20170514 Sun   Failed
job1   20170515 Mon   Not Run
job1   20170520 Sat   Not Run
job1   20170521 Sun   Not run
job1   20170522 Mon   Not Run
job2   20170506 Sat   Failed
job2   20170507 Sun   Not run
job2   20170508 Mon   Completed
job2   20170513 Sat   Completed
job2   20170514 Sun   Completed
job2   20170515 Mon   Completed
job2   20170520 Sat   Not Run
job2   20170521 Sun   Not run
job2   20170522 Mon   Not Run
 and so on.

I need to take weekend (Sat, Sun, Mon) as a single entity and should get a resulting table as follows:

job1   20170507 Sun   Completed
job1   20170513 Sat   Completed
job1   20170522 Mon   Not Run
job2   20170508 Mon   Completed
job2   20170515 Mon   Completed
job2   20170522 Mon   Not Run

If the job has an entry completed in any of these three days, it is considered as completed. Otherwise if all are Not run, then latest entry with Not Run is to be taken. How is it possible in talend.

I can create a table containing Sat to Mon interval dates as follows:

Table Y

20170506 Sat  20170508 Mon
20170513 Sat  20170515 Mon
20170520 Sat  20170522 Mon

Please give me an idea, should I use tJava, or tIntervalMatch components, how is it possible.

  • Data Integration
1 REPLY
Seven Stars TRF
Seven Stars

Re: To filter one row from a set of rows based on a condition

Here is my proposal:

Capture.PNG

1st subjob is here to add sequence to the input based on the weekday and to separate Completed and NotCompleted jobs.
For this purpose I use the following tMap_23 in which both flows have the same shema:

Capture.PNG

The new field jobNumber contains the value of the jobname + a sequence number separated by a dash (job-1 for example).
As the sequence is based on the weekday, every lines associated to the same jobname have the same value for jobNumber.
Result for both flows is memorized into a dedicated tHashOutput (here called Completed and NotCompleted).

After that, a 2nd subjob is started to exclude NotCompleted records associated to a Completed one. This the role of tMap_24:

Capture.PNG

This is a classic inner join based on jobNumber. Inner join rejected records are catched, then sorted on jobNumber asc and execDate desc (tSortRow_1) and finaly records with the same jobNumber are eliminated except the 1st one (tUniqRow_6) and the result is appended at the end of the tHashOutput called Completed in the 1st subjob (here RawResult task).

Last subjob read the tHashInput called RawResult, remove the extra field jobNumber, sort all the record based on job before to display the result on the console:

Capture.PNG

Hope this helps.

PS. If you like this proposal, don't forget to like it too on stackoverflow Smiley Wink

 


TRF