Good day guys
I might have uncovered potential bugs in the tMSSQLSCD and tMYSQLSCD components. I am rather new to this so your input and double checking will be appreciated. I will explain my jobs below:
(MSSQLSCD is very bad, MYSQLSCD is much better, but there is one bug).
Job 1: Testing a POC using tMSSQLSCD (Microsoft SQL Server)
I pull client related data from various sources into a single client table.
The table has the following columns:
The column in red is the client number, and the columns in yellow is what I use for the SCD Type 2. Here is my SCD setup for my SCD job that handles changes in client phone numbers (for the POC we only care about phone numbers):
As you can see, I only really care about the phone numbers for my POC, and I also specify the start and end data, as well as the "is_active" column that Talend should set to 0 or 1.
But here is the problem:
As you can see, even though the phone numbers stay the same, the MSSQLSCD component creates a new surrogate key. Also, it never sets the end_date to the proper date, and it also never sets the is_active to 0.
But, I did find the update code in the Talend job but it never gets executed (I ran SQL Server Profiler, but there are always only insert jobs, never update jobs):
Job 2: Testing a POC using tMYSQLSCD (MYSQL)
The MySQL version of the jobs look exactly the same, only the SCD component was changed to the tMYSQLSCD connector.
It works A LOT better than the SQL Server job, see below:
As you can see, only when the phone numbers change, will it create a new record (which is 100% correct and the way it should work. Even if I run the job 5 times with the same phone numbers, it will only insert a record once, not like SQL Server where it inserts a record every time even when the data did not change).
The start_date and end_date columns are also working very well, but the is_active column is never populated with a 0 or a 1.
Here is my MySQL SCD:
Here is also the code I could find in the Talend job that set the is_active column to 0:
Now I am thinking that the data type of my is_active column (which currently is varchar) should be something else. The reason I am saying this is because all the other columns update properly, except the is_active column.
I ran the above update in MySQL on the is_active column (is_active = b'0') and it also didn't update the value, so I think that my data type is wrong perhaps.
Please can you have a look at the above. I am more interested in getting the SQL Server portion working because we want to use SQL Server going forward, but MySQL will be fine for the POC.
Thank you in advance!
Could you please indicate on which build version you got this issue?
Thanks for the quick reply.
I am using Talend Open Studio for Big Data version 6.3.1. There is no build ID, it is just blank.
Does this issue still repro on V 6.4?
Would you mind sending us your job by .zip file? So that we can make a testing on this issue.
Thank you for the reply.
Please see attached 2 jobs. One for MySQL and one for SQL Server. You will also find attached a text file with my test data.
Here is a brief description of the jobs and also the create table statements:
The first time I ran the job, I ran it with the data below (attached) just to populate my table.
Then the second time I ran the job, I changed the portions in yellow to CHANGE, CHANGE and ran my job again. It updated the MySQL table's date fields correctly and also added the new record, but it didn't update the is_active field:
For the MSSQL job, I did exactly the same thing, the SCD just happens on Microsoft SQL Server. As you can see in the image below, I imported the data, and then changed the fields that say CHANGE to CHANGE2, but the dates and is_active fields never update.
Here are the create table statements for MySQL and SQL Server:
CREATE TABLE `dm_merchant2` (
`MRCH_ACCT_NO` varchar(100) DEFAULT NULL,
`company` varchar(100) DEFAULT NULL,
`STAT_CDE` varchar(100) DEFAULT NULL,
`open_date` varchar(100) DEFAULT NULL,
`close_date` varchar(100) DEFAULT NULL,
`segment` varchar(100) DEFAULT NULL,
`brand` varchar(100) DEFAULT NULL,
`sub_brand` varchar(100) DEFAULT NULL,
`surrogate_key` int(11) NOT NULL AUTO_INCREMENT,
`start_date` datetime DEFAULT NULL,
`end_date` datetime DEFAULT NULL,
`is_active` varchar(10) DEFAULT NULL,
PRIMARY KEY (`surrogate_key`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
CREATE TABLE [dbo].[dm_merchant2](
[MRCH_ACCT_NO] [nchar](100) NULL,
[company] [nchar](100) NULL,
[STAT_CDE] [nchar](100) NULL,
[open_date] [nchar](100) NULL,
[close_date] [nchar](100) NULL,
[segment] [nchar](100) NULL,
[brand] [nchar](100) NULL,
[sub_brand] [nchar](100) NULL,
[surrogate_key] [int] IDENTITY(1,1) NOT NULL,
[start_date] [datetime] NULL,
[end_date] [datetime] NULL,
[is_active] [char](1) NULL,
PRIMARY KEY CLUSTERED
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Let me know if you need more information.
Watch the recorded webinar!
Accelerate your data lake projects with an agile approach
Create systems and workflow to manage clean data ingestion and data transformation.
Introduction to Talend Open Studio for Data Integration.