Talend Open Studio tMysqlSCD - special characters cause new version always

Six Stars

Talend Open Studio tMysqlSCD - special characters cause new version always

I'm having an issue with the tMysqlSCD component.  I ran a load, and then ran another load without changing the input data.  So my expectation would be that there would be no new Type2 SCD records generated as nothing changed.  Surprise! I have some new SCD versions!  What?  HOW?

 

It appears that if the column being checked for changes contains special characters (in this case linefeed LF) there is some flaw in the SCD type 2 detection that will always create a new version.

 

This is a huge pain as a lot of our text fields can have special characters.  Any way to get around this?  I'm on 6.5.1 should that matter.

 

thanks, Bryan

 


Accepted Solutions
Six Stars

Re: Talend Open Studio tMysqlSCD - special characters cause new version always

Solution:

in MySqlConnection, additional jdbc parameters, add utf8, for example:

noDatetimeStringSync=true&characterEncoding=utf8

 

Then the UTF characters are loaded to MySQL properly, and thus the SCD doesn't see differences.


All Replies
Six Stars

Re: Talend Open Studio tMysqlSCD - special characters cause new version always

Actually - a correction.  It isn't a LF character causing the issue (as trim() doesn't remove it).  A dump of an offending string finds extended ASCII characters, in this case characters 0xC2 and 0xAC.

 

Looking at the source system the data is "ul Podbipięty 27/1"  Note the special ę  character.  I think this is causing the issue.  The target table is MySql utf8mb4 --- but in MySQL workbench the character no longer looks like ę but instead is showing as a ?

 

My hunch here is that somehow the extended character is being modified by the time it lands in MySQL, and then the SCD is finding it to be 'different' (well it is), and then tries to do a type 2 update, which just repeats the problem.

Six Stars

Re: Talend Open Studio tMysqlSCD - special characters cause new version always

Solution:

in MySqlConnection, additional jdbc parameters, add utf8, for example:

noDatetimeStringSync=true&characterEncoding=utf8

 

Then the UTF characters are loaded to MySQL properly, and thus the SCD doesn't see differences.

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.

Download