Five Stars

Possible SCD bugs found in tMSSQLSCD and tMYSQLSCD

Good day guys

 

I might have uncovered potential bugs in the tMSSQLSCD and tMYSQLSCD components. I am rather new to this so your input and double checking will be appreciated. I will explain my jobs below:

 

(MSSQLSCD is very bad, MYSQLSCD is much better, but there is one bug).

 

Job 1: Testing a POC using tMSSQLSCD (Microsoft SQL Server)

I pull client related data from various sources into a single client table. 

The table has the following columns:

SQLServerTable.PNG

 

The column in red is the client number, and the columns in yellow is what I use for the SCD Type 2. Here is my SCD setup for my SCD job that handles changes in client phone numbers (for the POC we only care about phone numbers):

mssqlscd.PNG

As you can see, I only really care about the phone numbers for my POC, and I also specify the start and end data, as well as the "is_active" column that Talend should set to 0 or 1. 

 

But here is the problem:

SQLServerData.PNG

 

As you can see, even though the phone numbers stay the same, the MSSQLSCD component creates a new surrogate key. Also, it never sets the end_date to the proper date, and it also never sets the is_active to 0.

 

But, I did find the update code in the Talend job but it never gets executed (I ran SQL Server Profiler, but there are always only insert jobs, never update jobs):MSSQLCode.PNG

 

Job 2: Testing a POC using tMYSQLSCD (MYSQL)

The MySQL version of the jobs look exactly the same, only the SCD component was changed to the tMYSQLSCD connector.

 

It works A LOT better than the SQL Server job, see below:

MySQLData.PNG

 

As you can see, only when the phone numbers change, will it create a new record (which is 100% correct and the way it should work. Even if I run the job 5 times with the same phone numbers, it will only insert a record once, not like SQL Server where it inserts a record every time even when the data did not change).

The start_date and end_date columns are also working very well, but the is_active column is never populated with a 0 or a 1.

Here is my MySQL SCD:

MySQLSCD.PNG

 

Here is also the code I could find in the Talend job that set the is_active column to 0:

MySQLCode.PNG

 

 

Now I am thinking that the data type of my is_active column (which currently is varchar) should be something else. The reason I am saying this is because all the other columns update properly, except the is_active column.

I ran the above update in MySQL on the is_active column (is_active = b'0') and it also didn't update the value, so I think that my data type is wrong perhaps.

 

Please can you have a look at the above. I am more interested in getting the SQL Server portion working because we want to use SQL Server going forward, but MySQL will be fine for the POC.

 

Thank you in advance!

 

Regards

 

Leigh

 

  • Data Integration
4 REPLIES
Moderator

Re: Possible SCD bugs found in tMSSQLSCD and tMYSQLSCD

Hello,

Could you please indicate on which build version you got this issue?

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Five Stars

Re: Possible SCD bugs found in tMSSQLSCD and tMYSQLSCD

Hi Sabrina

 

Thanks for the quick reply.

 

 I am using Talend Open Studio for Big Data version 6.3.1. There is no build ID, it is just blank.

 

Regards

 

Leigh

Moderator

Re: Possible SCD bugs found in tMSSQLSCD and tMYSQLSCD

Hi,

Does this issue still repro on V 6.4?

Would you mind sending us your job by .zip file? So that we can make a testing on this issue.

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Five Stars

Re: Possible SCD bugs found in tMSSQLSCD and tMYSQLSCD

Hi Sabrina

 

Thank you for the reply.

 

Please see attached 2 jobs. One for MySQL and one for SQL Server. You will also find attached a text file with my test data.

 

Here is a brief description of the jobs and also the create table statements:

 

The first time I ran the job, I ran it with the data below (attached) just to populate my table.

 

InitialData.PNG

Then the second time I ran the job, I changed the portions in yellow to CHANGE, CHANGE and ran my job again. It updated the MySQL table's date fields correctly and also added the new record, but it didn't update the is_active field:

mySQLchange.PNG

 

 

For the MSSQL job, I did exactly the same thing, the SCD just happens on Microsoft SQL Server. As you can see in the image below, I imported the data, and then changed the fields that say CHANGE to CHANGE2, but the dates and is_active fields never update.

SQLServer.PNG

 

Here are the create table statements for MySQL and SQL Server:

MYSQL:

CREATE TABLE `dm_merchant2` (
`MRCH_ACCT_NO` varchar(100) DEFAULT NULL,
`company` varchar(100) DEFAULT NULL,
`STAT_CDE` varchar(100) DEFAULT NULL,
`open_date` varchar(100) DEFAULT NULL,
`close_date` varchar(100) DEFAULT NULL,
`segment` varchar(100) DEFAULT NULL,
`brand` varchar(100) DEFAULT NULL,
`sub_brand` varchar(100) DEFAULT NULL,
`surrogate_key` int(11) NOT NULL AUTO_INCREMENT,
`start_date` datetime DEFAULT NULL,
`end_date` datetime DEFAULT NULL,
`is_active` varchar(10) DEFAULT NULL,
PRIMARY KEY (`surrogate_key`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

 

SQL Server:

CREATE TABLE [dbo].[dm_merchant2](
[MRCH_ACCT_NO] [nchar](100) NULL,
[company] [nchar](100) NULL,
[STAT_CDE] [nchar](100) NULL,
[open_date] [nchar](100) NULL,
[close_date] [nchar](100) NULL,
[segment] [nchar](100) NULL,
[brand] [nchar](100) NULL,
[sub_brand] [nchar](100) NULL,
[surrogate_key] [int] IDENTITY(1,1) NOT NULL,
[start_date] [datetime] NULL,
[end_date] [datetime] NULL,
[is_active] [char](1) NULL,
PRIMARY KEY CLUSTERED
(
[surrogate_key] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

 

Let me know if you need more information.

 

Regards

 

Leigh