Four Stars

how to filter the updated rows

hello everybody,

i'm trying to setup a talend job, this is my current situation:

I have an Oracle table A, let's say it is something like this:

 

ID NAME SURNAME AGE

P56 John    Smith          36

P67 Micheal Douglas    21

P78 Will     Samuelson  56

 

I also have a file B, which contains the entire table A data, updated to the latest version.

 

ID NAME SURNAME AGE

P56 John    Smithson    35        <-- updated row

P67 Micheal Douglas    21

P78 Will     Samuelson  56

 

I need to produce a new Oracle table C, containing only the rows that need to be updated (i dont need to consider additional rows or deleted rows).

 

So in this example, i need a table C containing only

 

ID NAME SURNAME AGE

P56 John    Smithson    35 

 

How can i do this? I'm currently trying in the following way:

a tMap which takes 2 inputs, the file and the table A, and goes out to the new table C, i joined the 2 inputs via inner join, using the ID as a key.

I tried to add a filter to table A, in order to select only the rows where some data has changed, but i'm still moving my first steps in Talend and i cannot understand how to do that, any hint?

 

thanks in advance

 

 

 

 

5 REPLIES
Forteen Stars TRF
Forteen Stars

Re: how to filter the updated rows

In tMap, use a inner join between the table and the file based on all the fields.

Select the "Catch lookup inner join reject" option on the tMap output flow.

You'll get a the rows which are in the table and not in the file (new or modified).

If you want to identify news and deleted rows, you need to start with 2 tMap with join based on the Id field (1st with db as main and 2nd with file as main).


TRF
Forteen Stars TRF
Forteen Stars

Re: how to filter the updated rows

Does this help you?


TRF
Six Stars

Re: how to filter the updated rows

Hi rekotc,

 

here is another suggestion:

In tMap join tableA as main input with fileB as lookup on the ID field as you have done already.

In the Variable Panel, create a boolean variable "name_changed" which compares the column Name from tableA to the column Name of your fileB, returning true when they are different.

Repeat the last step for Surname and Age accordingly.

Create a boolean variable called "update_flag" which is true if any of the previous comparison results are true.

Use this update_flag in the filter section of your output, so only those rows with a change in some column(s) are sent to tableC.

 

Best regards,

 

Thomas

 

Four Stars

Re: how to filter the updated rows

I had the same question and used your approach.

In my case I only have one primary key (ID#) and one datafield "Data", so I added in the tmap a variable : 

Initial3.Data.equals(After3.Data)

and in the most right pane I add the expression filter :

Var.data_changed==false. I work with a dummy files where the original table is "BEFORE" and the changed table (from which I want to know what lines are updates is "AFTER". AFTER is my main and BEFORE is my lookup. The values are:

BEFORE:

--------------

1,A
2,B
3,C
5,E
6,F
7,K
AFTER :

------------

1,A
3,B
4,D
5,K

 

I notice the first update (where for primary key 3, the data value was changed from C to B) is detected correctly, but when he wants to compare the next line (4,D which is not present in the BEFORE file) then I get an error "Exception in component tMap_3 java.lang.NullPointerException" exactly at the line where I put "Initial3.Data.equals(After3.Data)".

Any idea on what is causing this??

 

Thanks!

Six Stars

Re: how to filter the updated rows

Hello JB000000001,

 

I suppose you are using a left outer join? In "BEFORE", the value "4,D" is not found and that is why "Initial3.Data" is null and you cannot call "equals()" on a null value. Add a condition to check "Inital3.Data" for null, e.g. like this:

Initial3.Data == null ? false : Initial3.Data.equals(After3.Data)

 

I hope that was helpful.

 

Best regards,

 

Thomas