Multi Dimensional Modeling with Talend: retrieving surrogate keys

One Star

Multi Dimensional Modeling with Talend: retrieving surrogate keys

Hi,
I am currently using Talend to built a multidimensional schema for Reporting purposes.
related posts to my problem:
1448
6465
My problem:
Consider a source table from a production system, from which I need to load a time-dimension table, and load the linked fact table.
using SCD components, I am able to build the SCD dimension and generate the surrogate key. Now, I need to retrieve this surrogate key and load it in the corresponding lines of my fact table, in order for each line in my fact table to be directed to the appropriate date in my dimensional table. For the moment, the only way I am able to do so is by doing a lookup, for each line of my fact table, of the date and retrieve the appropriate key in the dimensional table. I see two constraints with this:
1) I will need two identical fields in my fact table and dimensional table, in order to do the lookup. In my case, fields in my dimensional table have undergone many transformations (format, parsing, etc), so i can not do such a lookup except by re-introducing in my dimensional table the fact table format-like field
2) memory and time consuming.
is there any other way to retrieve automatically these surrogate keys? that would be a great help for dimensional modeling.
thanks
Pierre
One Star

Re: Multi Dimensional Modeling with Talend: retrieving surrogate keys

Hi,
I really don't understand why it is difficult to join both tables, you probably have a date field in your source data and in your dimension table, so you just need to make a join on the date field, retrieve the surrogate key from the dimension table and insert it in your fact table.
If it's not that simple, maybe could you show us the structure of your dimension and fact tables ?
Regards
One Star

Re: Multi Dimensional Modeling with Talend: retrieving surrogate keys

thanks for your reply.
lets us focus on payments types example. please consider the below lines populating my fact table:
|Fact Table |
----------------------------------------------------------
Paiement Carte 190509 Relay 3243 Paris 15 -
Paiement Carte 180509 Eke 7eme S Paris 7 -
Plt Ae Carte France Dd N.075931932nne101869 -
Plt Ae Carte France Dd N.075931932nne101869 -
from those, using a java code and a tJavaRow component, I extract the substring just before the first occurence of a number to get the payment types, then I fill my dimension table. which gives from my fact table the following payment types:
ID |Payment Type Dimension|
------------------------------
1 | Paiement Carte
2 | Plt Ae Carte France
=>
1)when i will want to do my join between these two tables to get the surrogate key (ID) for my fact table, I will need to use this java code again.
2) moreover, since I have special cases to treat, I can not use a TMap component which does not support if/then clauses
so that might be possible with additional steps and components, but I would like to ensure that there is no other nicer/more efficient way, such as a component that i might have missed. that would really be helpful for dimensional modeling.
thanks
Pierre
One Star

Re: Multi Dimensional Modeling with Talend: retrieving surrogate keys

Hi Pierre,
it's quite weird to populate a dimension table from a fact table but let say that you have no choice...
If you don't want to parse twice the source data, you can:
- parse it once
- connect the output to a tReplicate
- from the tReplicate you can 1/ connect a flow to populate the dimension table 2/ connect a second flow to store the parsed data into a temporary file
In a second step you can read the temporary file and join it with the dimension table using a tMap to retrieve your surrogate key.
The tMap does not allow a if/then statement, but you can usually use the " ? : " ternary operator, for example instead writing:
if (my_key == null) -1 else my_key
you can say:
my_key == null ? -1 : my_key
Regards,
One Star

Re: Multi Dimensional Modeling with Talend: retrieving surrogate keys

" it's quite weird to populate a dimension table from a fact table but let say that you have no choice... "
=> that was actually my transactional (source system) table, as my example can show. thanks for pointing up the typo.
I finally managed to build this flow using a Java routine which performs all the parsing, then calling it from the TMap component during the join process. But your suggestions looks very practical too -did not know the tReplicate component. will use it next time.
thanks again for your help.