By default, the pattern frequency table indicator, available in column analysis, only supports Latin characters.
Some characters such as Eastern Asian characters are not addressed by this indicator. Since version 6.1, an indicator called East Asia Pattern Frequency Table is introduced in the Studio and it allows you to extract patterns with Japanese, Chinese and Korean characters. You can use this indicator only with the Java engine.
A more efficient solution is to use custom SQL statements in a User Defined Indicator editor, but not all databases provide functions to work with regular expressions. We'll give here an example about how to create such an indicator that can run on PostgreSQL databases.
This procedure applies to version 6.1.1 and later of the Studio.
Suppose that you want to support Japanese character pattern mappings with SQL engine in PostgreSQL database and use this indicator in a column analysis.
PostgreSQL natively supports the function of regular expression replacement. All what you need to do is to create a UDI (User Defined Function) in the DQ Repository tree view and design the SQL template appropriately.
Click the Full SQL Template tab in the editor and enter the below SQL template.
For databases, such as Oracle, SQL Server and MySQL, which don't currently support regular expression functions as REGEX_REPLACE or the use of Unicode characters encoded with the hexadecimal encoding value of the form \xxxx, other solutions should be considered.
TDQ-11088 - Create a UDI of category pattern frequenct count which support Eastern Asian chars replacement for SQL engine CLOSED