Four Stars

Turkish Character Problem in Schema

Hi,

 

I am newbie to Talend, so if my question seems too obvious please accept my apologies beforehand. I have just installed Talend Open Studio and tried to create metadata for a delimited file. After a few attempts I realized that, if the file contains Turkish Characters in the column headings Open Studio fails to create the schema and gives 'At least one item must exist in Schema' Error. This happens when I select correct encoding method which as ISO-8859-9, with this setting Open Studio shows Turkish characters in the headings and in the data correctly. If I select UTF-8 as the encoding method, then characters not shown correctly but Schema can be created with wherever a Turkish character exists replaced by a "_". If I try to edit them and use Turkish characters in the column then I get an error saying the column name is not valid.When I check file's original format it seems to be ANSI - DOS/Windows.

 

I would appreciate any input or suggestions here. This is quite important for our local implementations.

 

Best Regards

3 REPLIES
Twelve Stars TRF
Twelve Stars

Re: Turkish Character Problem in Schema

Hi,
Field names in schema must be compliant with Java namming rules. So, if you use header file to automatically build the schema, you need to change field names which generate the error you get.

TRF
Four Stars

Re: Turkish Character Problem in Schema

Hi,

 

Thanks a lot for the answer. Actually, this will work for a file, but it is not very practical for a real project since we cannot control column names in the source systems or files. Is there any way to alter this? I tried to change Java default locale but not successful so far. It would be perfect if I can change Talend runtime parameters so that it would accept Turkish characters in the field names.

Twelve Stars TRF
Twelve Stars

Re: Turkish Character Problem in Schema

If you have a lot of files for which you need to create the schema, you can create a job to replace any special character from the header by the corresponding standard utf-8 character.

TRF