Handling Special Java characters in SQL Server

Four Stars

Handling Special Java characters in SQL Server

Hi all

I'm designing a framework for migrating large datasets to cloud. As part of the process, I had to design an engine to handle data quality issues, particularly removing unwanted characters, such as "Esc" character, carriage return and also certain unicode characters that are reserved for private use (e.g. \u0092). 

Currently I'm able to remove such characters simply by using something like:


currentRow = currentRow.replace("\n", "");
currentRow = currentRow.replace("\r", "");
currentRow = currentRow.replace("\u0092", "");


What I'm willing to do is to store these in a SQL database. Storing values such as \n, \r or \u0092 in a SQL table won't do the trick. Can you please describe how I can store these characters in the database and retrieve them properly so Talend can interpret them correctly? 



Any advise would be appreciated. 

Four Stars

Re: Handling Special Java characters in SQL Server

I found out that I can store \u0092 in SQL by storing what gets returned as a result of SELECT CHAR(146). However, this doesn't seem to work when I'm feeding that into the same "replace" command. Any thoughts? 

Ten Stars

Re: Handling Special Java characters in SQL Server

The replaceAll() method can take a regular expression and replace all instances that match that expression in a String.  The regex is usually just provided as a string:



If you want to maintain the set of characters in a database, you can store their string values and build a process that takes the list of characters returned by a query and turns it into your regex string.  As a rough example, you could connect your database input component to a FlowToIterate to a tJava component.  The tJava component would add the string for the current iteration to a globalMap variable:

globalMap.put("regexVar",((String)globalMap.get("regexVar")) + row1.invalidChars)

You'll want to initialize the globalMap variable to avoid null pointer exceptions, probably with "[" to start off the regex set, and add a "]" after all iterations to close the set.  Once the regex string is built, you would use it in a later component as:



Talend named a Leader.

Get your copy


Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables


How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration


Why Companies Move to the Cloud: 7 Success Stories

Learn how and why companies are moving to the Cloud

Read Now