tAccessInput cannot read UTF characters

One Star

tAccessInput cannot read UTF characters

I read in several posts that people are having problems querying UTF-8 (non-English) data from Access DBs. But none of solutions posted help, or the requests have just gone unanswered.
My problem is specifically querying Korean characters from an Access 2007 DB.
I'm currently using Talend Enterprise Data Integration 5.0.2 but have also tested in TOS 5.3.1 and 5.4.0 with the same results. I have tried java 1.6.0_21 and _38.
What DOES work:
(In MS Access):Export Table from Access as an Excel (xlsx) file.
Import data into Talend using tFileInputExcel(Encoding UTF8)---->tFileOutputDelimited(Encoding UTF8).
Output file (txt) shows Korean characters as expected. File Properties show UTF8 Encoding
Data Viewer (in 5.0.2) shows ?????? for Korean characters.
What does NOT work:
Import data into Talend using tAccessInput---->tFileOutputDelimited(Encoding UTF8)
Output file (txt) shows ?????? for Korean characters. File Properties show ANSI Encoding.
Data Viewer (in 5.0.2) shows ?????? for Korean characters.
What does NOT work:
Import data into Talend using tAccessInput(JDBC Parameter characterEncoding=UTF-8)---->tFileOutputDelimited(Encoding UTF8)
Output file (txt) shows ?????? for Korean characters. File Properties show ANSI Encoding.
Data Viewer (in 5.0.2) shows ?????? for Korean characters.
I have also attempted pulling the data from the Access table using the tDBInput component with the exact same results as the tAccessInput component.
How do I query an Access 2007 DB directly from Talend and retain the Korean characters?
Moderator

Re: tAccessInput cannot read UTF characters

Hi,
We will take a testing for your case and come back to you asap.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: tAccessInput cannot read UTF characters

Hi,
We will take a testing for your case and come back to you asap.
Best regards
Sabrina

Thanks.
One Star

Re: tAccessInput cannot read UTF characters

Sabrina,
Is there any progress on this issue? It's been a couple of weeks.
Moderator

Re: tAccessInput cannot read UTF characters

Hi,
Sorry for delay, we have confirmed it with our component team. It is a bug, indeed. Could you open a jira issue on Talend Bug Tracker, our developer will work on it.
Please paste the issue link on forum so that other community users are able to see it. Thanks for your contribution on talend.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: tAccessInput cannot read UTF characters

I experience same problem from tAccessInput to any UTF8 file (tried tFileOutputMSDelimited and tMysqlOutput) source is a *.accdb file. Cant figure out exaclty what encoding the access file is, but just default file encoding when creating an access file. All files like é etc. get converted to ?
One Star

Re: tAccessInput cannot read UTF characters

I found http://stackoverflow.com/questions/19192750/reading-unicode-characters-from-an-access-database-via-j.... Can there be made some kind of function to convert characters in any other way?
One Star

Re: tAccessInput cannot read UTF characters

I now used the following workaround for as long no fix is available. From access I export to XLSX then import from XSLX, this is only not handy when syncing with automation.
One Star

Re: tAccessInput cannot read UTF characters

This is an important issue. I am facing a similar problem with a file in English that is coming from the US government. Does anyone know when this will be fixed?
One Star

Re: tAccessInput cannot read UTF characters

I also am still waiting for a solution for this problem, a quick fix would be very much appreciated.
One Star

Re: tAccessInput cannot read UTF characters

Seems that no one has created a ticket, just created one:
https://jira.talendforge.org/browse/TDI-28814
Employee

Re: tAccessInput cannot read UTF characters

For French characters, Please add "charSet=iso8859-9" or "charSet=iso-8859-9" into "Additional JDBC Parameters" on advanced setting of tAccessInput.
For Korean characters, please refer to: https://jira.talendforge.org/browse/TDI-28224
One Star

Re: tAccessInput cannot read UTF characters

Hi Talend Team - I have managed to get my tAccessInput to generate a .csv file using tFileOutputDelimited, which now has the accented characters correctly displayed by adding the ISO character set 8859-9.  However, on importing it into Salesforce via tSalesforceBulkExec_1, the resulting records in Salesforce have accented characters replaced by ?.  Do you have any advice on what I need to do to resolve this?
One Star

Re: tAccessInput cannot read UTF characters

I created a bug ticket MONTHS ago and there has been no resolution. TDI-28224