Four Stars

Question - rationalizing our database, tmap or something else?

Hi guys

 

This is perhaps a very basic question, so basic I'm unable to find a similar one on here: I'm on a project to redesign the database of our legacy system, which is pretty old and has grown organically over time, not always necessarily to best practice. For example, over time as the business have required new data to be stored we've often simply tagged new attributes onto existing tables, leaving them large and not particularly coherent. I'm an analyst rather than a developer, and am trying to prove that these types of actions can be performed in Talend, but I've not been able to get it quite right.

 

Lets say I'm trying to split a legacy 'person' table which contains up to 3 name fields, into a 1:* table structure of 'Person_New' and 'Person_Credentials' to allow the flexibility of >3 names if we need it in the future. E.g.

 

Legacy Person table:

User_ID     ¦      Forename    ¦    Surname    ¦    Middlename

123             ¦      John             ¦     Smith         ¦     Thomas

 

Would become:

 

Person_New table:

User_ID

123

'Person_Credentials' table:

User_ID    ¦    Name_Type    ¦     Name

123            ¦   Forename       ¦      John

123            ¦    Surname        ¦     Smith

123            ¦    Middlename   ¦     Thomas

 

I tried doing this within a single tMap, asking it to create a new entry in the new 'person_credentials' table if there was a value in the respective legacy person table field by having 3 separate tOracleOutput pointing to the new 'person_credentials' table, with a rule on each output mapping to say "only if source field is not null". This worked to an extent and records were indeed created correctly in the target, but it didn't pick up everything: it picked all of the forenames, but only some of the surnames, and even fewer middle names. There isn't any dependency between the names, so I don't want it to avoid looking for a surname if the forename is null, for example.

 

Where am I going wrong; is tmap not the right component for this task?

 

thanks in advance

 

1 ACCEPTED SOLUTION

Accepted Solutions
Forteen Stars TRF
Forteen Stars

Re: Question - rationalizing our database, tmap or something else?

Hi,

In my opinion, tSplitRow is a better choice for this king of situation.

Here is what it should look like in your case:

 

Capture.PNG

On the top of the capture is the schema defined for the tSplitRow with the 3 output columns User_ID, Name_Type and Name.

On the bottom, you have the logic for the columns mapping where 1 row in the array corresponds to 1 outpout row for the component. Each column contains the logic to populate the corresponding field from the input row.

Here, we have to split 1 input row into 3 outpout rows where fields are populated with User_ID from the input row, Name_Type is hard coded with the desired values and Name is populated with the input column associated to the Name_Type.

After that you just have to use a tFilterRow to keep only rows where Name length is greater 0.

 

Here is my sample data:

"123";"John";"Smith";"Thomas"
"456";"Max";"La Menace";

And the result:

Starting job test at 20:13 27/01/2018.

[statistics] connecting to socket on port 3984
[statistics] connected
.-------+----------+-----------.
|          tLogRow_30          |
|=------+----------+----------=|
|User_ID|Name_Type |Name       |
|=------+----------+----------=|
|"123"  |Forename  |"John"     |
|"123"  |Surname   |"Smith"    |
|"123"  |Middlename|"Thomas"   |
|"456"  |Forename  |"Max"      |
|"456"  |Surname   |"La Menace"|
'-------+----------+-----------'

[statistics] disconnected
Job test ended at 20:13 27/01/2018. [exit code=0]

Hope this helps.


TRF
3 REPLIES
Four Stars

Re: Question - rationalizing our database, tmap or something else?

Something I've just realised and might be relevant to my question: I had a row count on my input table to filter only the first 3000 rows, as a subset for testing. I assumed that this shouldn't matter, but I'll remove this and process the full 100k entries I have in my source table for the next test and see what that does, but I'd still be interested in the view of whether the approach I'm taking is the best one.

thanks

Forteen Stars TRF
Forteen Stars

Re: Question - rationalizing our database, tmap or something else?

Hi,

In my opinion, tSplitRow is a better choice for this king of situation.

Here is what it should look like in your case:

 

Capture.PNG

On the top of the capture is the schema defined for the tSplitRow with the 3 output columns User_ID, Name_Type and Name.

On the bottom, you have the logic for the columns mapping where 1 row in the array corresponds to 1 outpout row for the component. Each column contains the logic to populate the corresponding field from the input row.

Here, we have to split 1 input row into 3 outpout rows where fields are populated with User_ID from the input row, Name_Type is hard coded with the desired values and Name is populated with the input column associated to the Name_Type.

After that you just have to use a tFilterRow to keep only rows where Name length is greater 0.

 

Here is my sample data:

"123";"John";"Smith";"Thomas"
"456";"Max";"La Menace";

And the result:

Starting job test at 20:13 27/01/2018.

[statistics] connecting to socket on port 3984
[statistics] connected
.-------+----------+-----------.
|          tLogRow_30          |
|=------+----------+----------=|
|User_ID|Name_Type |Name       |
|=------+----------+----------=|
|"123"  |Forename  |"John"     |
|"123"  |Surname   |"Smith"    |
|"123"  |Middlename|"Thomas"   |
|"456"  |Forename  |"Max"      |
|"456"  |Surname   |"La Menace"|
'-------+----------+-----------'

[statistics] disconnected
Job test ended at 20:13 27/01/2018. [exit code=0]

Hope this helps.


TRF
Four Stars

Re: Question - rationalizing our database, tmap or something else?

Great thanks for you help, I had looked into tSplitRow but didn't appreciate its usage so your example really helped, especially pairing it with the tFilter it became clear how it can be used.

 

I *think* I had it working with a tmap, but even if it did work the tmap was a mess, I guess this is a more efficient solution!

 

Thanks