Talend Open Studio tMysqlSCD - special characters cause new version always

Highlighted
Five Stars

Talend Open Studio tMysqlSCD - special characters cause new version always

I'm having an issue with the tMysqlSCD component.  I ran a load, and then ran another load without changing the input data.  So my expectation would be that there would be no new Type2 SCD records generated as nothing changed.  Surprise! I have some new SCD versions!  What?  HOW?

 

It appears that if the column being checked for changes contains special characters (in this case linefeed LF) there is some flaw in the SCD type 2 detection that will always create a new version.

 

This is a huge pain as a lot of our text fields can have special characters.  Any way to get around this?  I'm on 6.5.1 should that matter.

 

thanks, Bryan

 


Accepted Solutions
Five Stars

Re: Talend Open Studio tMysqlSCD - special characters cause new version always

Solution:

in MySqlConnection, additional jdbc parameters, add utf8, for example:

noDatetimeStringSync=true&characterEncoding=utf8

 

Then the UTF characters are loaded to MySQL properly, and thus the SCD doesn't see differences.


All Replies
Five Stars

Re: Talend Open Studio tMysqlSCD - special characters cause new version always

Actually - a correction.  It isn't a LF character causing the issue (as trim() doesn't remove it).  A dump of an offending string finds extended ASCII characters, in this case characters 0xC2 and 0xAC.

 

Looking at the source system the data is "ul Podbipięty 27/1"  Note the special ę  character.  I think this is causing the issue.  The target table is MySql utf8mb4 --- but in MySQL workbench the character no longer looks like ę but instead is showing as a ?

 

My hunch here is that somehow the extended character is being modified by the time it lands in MySQL, and then the SCD is finding it to be 'different' (well it is), and then tries to do a type 2 update, which just repeats the problem.

Five Stars

Re: Talend Open Studio tMysqlSCD - special characters cause new version always

Solution:

in MySqlConnection, additional jdbc parameters, add utf8, for example:

noDatetimeStringSync=true&characterEncoding=utf8

 

Then the UTF characters are loaded to MySQL properly, and thus the SCD doesn't see differences.

Tutorial

Introduction to Talend Open Studio for Data Integration.

Definitive Guide to Data Integration

Practical steps to developing your data integration strategy.

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.