One Star

Encoding problems when loading data into Oracle Database

Hi,
I am downloading a number of zip files from an FTP server, unzipping and trying to load the txt files into the Oracle database. The files that come in are basically of different encoding/charactersets.
The files are of three types (charactersets)
1. text/plain; charset=us-ascii
2. text/plain; charset=utf-16
3. text/plain; charset=utf-8
I am trying to load the tab delimited data in the unzipped txt files into an Oracle 11G database. I want to know if there is a component that will convert any characterset to utf-8, so that while I load the data into the database using SQL Loader, I do not come across errors.
I would really appreciate if anybody can give me suggestions/help or give me ideas as to load the data that has different charactersets.
Thank you,
Santhosh.
3 REPLIES
Employee

Re: Encoding problems when loading data into Oracle Database

Hi,
The beauty with JAVA is whatever the ENCODING_TYPE you choose when you read a dataflow; the JAVA Object will convert into the expect format in output.
For example; in Talend if you choose as a constant that you want to read your INPUT_FILE with the ENCODING UTF-16 (which include UTF-8 and US-ASCII); then you can choose to have UTF-8 on the Output : tOracleOutputBulkExec.
Into the tOracleOutputBulkExec component the choice of UTF-8 for the encoding will handle the conversion of your source to the encoding for the BULK_FILE prior to be loaded with the SQLLoader.
Your job will looks like that if you have the same dataStructure; for example 5 COLUMN_FIELDS:
tFtpGet --onComponentOk--> tFileList --iterate--> tFileInputDelimited(UTF-16) --row--> tMap(for any conversion) --output--> tOracleOutputBulkExec(UTF8).
I don't have time to open my studio to deliver a proper screenshot. Let me know if my ASCII-CHARTS is not clear Smiley Happy and I'll post a screenshot.
One Star

Re: Encoding problems when loading data into Oracle Database

Thank you Cantoine.
I will try your method and see if the whole process works for us. I really appreciate you taking the time to answer my questions.
Thank you,
Santhosh.
One Star

Re: Encoding problems when loading data into Oracle Database

Hi,
The beauty with JAVA is whatever the ENCODING_TYPE you choose when you read a dataflow; the JAVA Object will convert into the expect format in output.
For example; in Talend if you choose as a constant that you want to read your INPUT_FILE with the ENCODING UTF-16 (which include UTF-8 and US-ASCII); then you can choose to have UTF-8 on the Output : tOracleOutputBulkExec.
Into the tOracleOutputBulkExec component the choice of UTF-8 for the encoding will handle the conversion of your source to the encoding for the BULK_FILE prior to be loaded with the SQLLoader.
Your job will looks like that if you have the same dataStructure; for example 5 COLUMN_FIELDS:
tFtpGet --onComponentOk--> tFileList --iterate--> tFileInputDelimited(UTF-16) --row--> tMap(for any conversion) --output--> tOracleOutputBulkExec(UTF8).
I don't have time to open my studio to deliver a proper screenshot. Let me know if my ASCII-CHARTS is not clear Smiley Happy and I'll post a screenshot.

Apparently this doesn't work; please see my screenshots below. I have set up tFileList --> Iterate --> tFileInputDelimited(UTF-16) components and tested it with two files (download). Only one file is being imported correct, the other one is imported as garbish: "???????????????????".