Slowly changing dimensions: Only update field when another changes.

One Star mpa
One Star

Slowly changing dimensions: Only update field when another changes.

Hi,
I'm having the following problem in one of my jobs at the moment.
For my customer table I have been using slowly changing dimensions for a while now.
Recently I have been asked to add a loadno to my table. (to keep track of when a customer was uploaded and in what file)
What I want is for the loadno to change if something changes in the record. For example if a customer name changes I want the LoadNo to change to the current Loadno. And if nothing has changed to the file I want the LoadNo to be unchanged.
What I'm doing right now:
- I'm reading in customer files and i'm doing a couple of lookups to see if all the data is correct. (existing Sales Representative etc.)
Job Looks Like: tFileinputdelimited --> tMap --> tMssqlSCD
- I add the Loadno in the tmap component using a context variable. Because of this the LoadNo is always different then what I have in my table.
I have tried following methods, but both unsuccesfull:
? When I put my LoadNo column into my type 1 fields: All customers that were included in today's file have their LoadNo changed to todays Loadno. (even though most of them were unchanged)
? When I leave the LoadNo column into unused: The record gets updates if a type 1 field was changed. But the LoadNo doesn't get updated.
Is there a way to do this using the tMssqlSCD settings? Or do I have to change my tMap component and add my LoadNo on another way so SCD doesn't think the record is changed?
I hope I have been clear. If something isn't clear feel free too ask and i'll try to be more clear.
Thanks in advance,
Mario De Pauw

Re: Slowly changing dimensions: Only update field when another changes.

put your LoadNo in as type 0
One Star mpa
One Star

Re: Slowly changing dimensions: Only update field when another changes.

Thanks for your quick response.
I have tried it but the LoadNo remains unchanged (Still gives the old LoadNo) where the Customer name has changed (this is a type 1 field).

Re: Slowly changing dimensions: Only update field when another changes.

Dosnt look like the SCD components support this type of update. Since you're doing type 1 updates, if your dimension table is not very large, you can replace your SCD component with a tMap that accomplishes the same logic (two output tables-- one for updates one for inserts) by joining your input data to the dim table.
One Star mpa
One Star

Re: Slowly changing dimensions: Only update field when another changes.

How can I achieve this? By using the additional columns in the tmssqloutput and adding the LoadNo column there?
I tried using the field options and disabling Loadno from updating but that doesn't seem to be the way to do it.
Or do I have to do this in tMap?
One Star

Re: Slowly changing dimensions: Only update field when another changes.

I have the same problem - just with the difference that I work with a load_date instead of a load_no. Obviously, the SCD components are not able to handle this situation. It is not a proper solution but work-around when you try to achieve your goal with a tMap instead of a SCD component. I'll open a bug report.
One Star mpa
One Star

Re: Slowly changing dimensions: Only update field when another changes.

Is it for example possible to know all the rows that were updated and then change the loadno after they were updated? (when using a tmssqloutput because i'm having the same problem for my product groups who don't use SCD)
One Star

Re: Slowly changing dimensions: Only update field when another changes.

Try to work with the versioning attribute of type "version" in the SCD Editor:
1) Create a attribute called LOAD_NO (Integer) in your input schema.
2) In the SCD Editor: Write LOAD_NO into the name field of the versioning attribute of type "version" and check the check box on the left hand side (see attached screenshot)
One Star mpa
One Star

Re: Slowly changing dimensions: Only update field when another changes.

Is this going to take the value I give it in my tMap component?
One Star

Re: Slowly changing dimensions: Only update field when another changes.

Yes
One Star mpa
One Star

Re: Slowly changing dimensions: Only update field when another changes.

It apparantly doesn't do what I want. The LoadNo doesn't get updated when one of my type 1 fields get updated. They only get updated when a level 2 field changes.
Do I have to do anything else or doesn't it support what i'm asking?
One Star

Re: Slowly changing dimensions: Only update field when another changes.

Sorry, you are right - my suggestion only works for SCD2. I had overseen that you are also dealing with SCD1 changes. As a mentioned at the beginning I have that same problem - not with a load_no but with a load_date instead. In the meantime I opened a bugtracker issue (No. 0022309).
As a work-around for myself I implemented the following:
- load_date as SCD1 attribute
- Do a MINUS set operation on the input data source: <select all entries> MINUS <select existing entries from the affected dimension>
This has the only inconvienence that in case of an SCD2 change the whole history (all records) of a dimensional business key is updated.

Re: Slowly changing dimensions: Only update field when another changes.

Hi,
I am new to talend environment,please if any one knows let me know how to implement scd2 logic without using
scd component.............
Four Stars

Re: Slowly changing dimensions: Only update field when another changes.

Hi nagamani,
Why you don't want to use SCD component for implementing SCD2? any specific reason.
Can you elaborate ?
Whether existing SCD component is not satisfying your need ?
Vaibhav
Moderator

Re: Slowly changing dimensions: Only update field when another changes.

Hi,
I am new to talend environment,please if any one knows let me know how to implement scd2 logic without using
scd component.............

Do you have any specific requirement on scd? Please give us more description about your job design.
Slowly Changing Dimensions (SCDs) are dimensions that have data that slowly changes. The SCD editor offers the simplest method of building the data flow for the SCD outputs. In the SCD editor, you can map columns, select surrogate key columns, and set column change attributes through combining SCD types.
Please take a look at component reference TalendHelpCenter:tMysqlSCD.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.

Re: Slowly changing dimensions: Only update field when another changes.

Hi ,
Thanks for fast reply,
Initially i did it with tverticascd component but it was not generating surrogate keys.That field was null for all records.
So i thought i will implement logic seperately using tmap..
My source table contains
Id, mgr,location
100,abc,hyd
200,bcd,banglore
my target table
sk,id,manager,scd_start,scd_end,scd_active
so please let me know how can i implement without using scd component
Thanks,
nagamani.

Re: Slowly changing dimensions: Only update field when another changes.

Hi Vaibhav,
Thanks for reply
I wanted implement scd type2 logic with out using scd component for vertica database.
actually i tried with scd component surrogate key is not populating ..........for all records..

but iam interested to know how we can implement this logic in general way please tell if you know....

Thanks.

Re: Slowly changing dimensions: Only update field when another changes.

Hi,
i have tried the scd type2 logic without scd component ,my data base is Hp vertica DB
It has got three flows
1.Fresh insert
2.updating alredy existing record and making it as inactive
3.inserting new updated record.
if i update and run its giving following error can any one please help me with this..

Exception in component tVerticaOutput_3
java.sql.SQLException: (5156) ERROR: Unavailable: initiator locks for query - Locking failure: Timed out I locking Table:tjob.issuer_dim. X held by . Your current transaction isolation level is READ COMMITTED
at com.vertica.util.ServerErrorData.buildException(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.readCopyStartResponse(Unknown Source)
Job scd_1tmap ended at 17:20 12/03/2014.

Thanks,
nagamani.
Four Stars

Re: Slowly changing dimensions: Only update field when another changes.

Hi Nagamani,
1) You can implement SCD2 without using talend, by making use of pure sql for all the conditions which you understand.
2) If you want to use talend, make different queries for all the scd scenarios and update database columns respectively. All these queries will have different flows and work one after other
3) If your scd component for vertica is failing, this is not a good reason to bypass scd component in talend. Better approach would be put a screenshot along for your design and error. And get help to resolve it.
This will help to resolve issue if any in the scd component in talend and help you to save your time in thinking and designing logic for which people have already invested their time.
Thanks
Vaibhav

Re: Slowly changing dimensions: Only update field when another changes.

Hi vaibhav,
Thanks for your reply ,,but i implemented this logic using tmap component...
thanks
nagamani
Four Stars

Re: Slowly changing dimensions: Only update field when another changes.

I've implemented SCD logic in Talend using the tMap in cases where I needed to pass certain fields through the SCD component without checking their status i.e. untouched.
Though this is a different example (for updating a fact table), the logic is same - using a flag to indicate what records should be updated or updated. The only thing I don't show in this example is the changing of the SCD flags (active, start and end dates)... http://www.talendforge.org/forum/viewtopic.php?pid=128716#p128716
One Star

Re: Slowly changing dimensions: Only update field when another changes.

can any one help me to solve the following scenario ?
There is an Employee dimension with Emp id, first name, last name, manager name and department. You should have 2 schema or database in Postgres. First will be your source, 2nd will be your target (DWH). So initially you will have let’s say 10 rows in the source. You will load those 10 rows. Now let’s say there is a change in the source and 2 employee’s manager is changed. My target table should be able to capture these change, along with the time it was changed. So basically it is called SCD2 implementation. At the end my Source will have 10 rows and target 12 rows.