The problem of Analyses is use russian ALPHABET in patterns.
1. I create pattern with russian chars.
FIRST_TWO_RUS_CHARS = '^(?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?
The pattern (analog , but for russian chars) doesn't work. I don't know why.
2. I create Analyse DOC_ANALYSIS for column DOC in my table.
3. I selecte FIRST_TWO_RUS_CHARS pattert for analyse column.
4. Run profiling. Result is good for me.
5. Close TDQ.
6. Look in the file workspace\PROFILING\TDQ_Data Profiling\Analyses\ ... .ana
I see that russian chars is good in this file.
7. Open TDQ.
8. Look in the file workspace\PROFILING\TDQ_Data Profiling\Analyses\ ... .ana
I see that russian chars is not good in this file!!! The encode was broken!!!
But when I create pattern, for example '^(?|?|???|???|?|?|???|???)$' (russian gender), TDQ workes OK.
Why pattern don't work?
Why russian chars was broken in .ana file when I opened TDQ?
What do You think about this?
SELECT '?' REGEXP '^.$' AS OK;
The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multi-byte safe and may produce unexpected results with multi-byte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.