tChangeFileEncoding and UTF8 encoding

Five Stars

tChangeFileEncoding and UTF8 encoding

Hello, 

 

I have an input_file encoded in ANSI that I want to encode to UTF-8.

 

So basically, I use the tChangeFileEncoding component and I do get an output_file encoded in UTF-8. While I open it with notepad++, everything is alright. 

But when i open it with Excel, "€" and "é" caracters show me things like "€_" and "é". 

 

Is there any way to fix this ? 


Accepted Solutions
Five Stars

Re: tChangeFileEncoding and UTF8 encoding

If anyone ever has the same problem, here is how I solved mine. As a reminder, I needed to change the encoding of a file.csv from ANSI to UTF-8. And I also had a problem with my UTF-8 file when I opened it with Excel.

 

First things first, it is apparently well known that excel has trouble dealing with files.csv in UTF-8. (example here). And since, the file didn't have to be used in Excel in the end, I just ignore that part. 

 

Secondly, I found that my file was not encoded in ISO-8859-15 (aka Latin-9) as I thought it was natively but in Latin-1. I tried using the option "Custom" encoding from the tFileChangeEncoding to do the job, but it was not as intuitive as I thought it would be. So I used a tJava component + a custom routine to solve this problem. For the routine, I used the java.NIO library and I found here all the encoding supported by this library. My encoding is/was "windows-1252". 

 

After that, I simply had to call my routine like : 

myPackage.MyCustomRoutine.myMethod (input_encoding, output_encoding, input_directory+input_filename, output_directory+output_filename);

 

 

 


All Replies
Twelve Stars

Re: tChangeFileEncoding and UTF8 encoding

@JoshyBrown,what type of file are you using to change the coding?

Manohar B
Five Stars

Re: tChangeFileEncoding and UTF8 encoding

It's a .csv file. 

Twelve Stars

Re: tChangeFileEncoding and UTF8 encoding

@JoshyBrown,based on the encoding those characters will be converted into special characters.

Manohar B
Five Stars

Re: tChangeFileEncoding and UTF8 encoding

@manodwhb, Is there a way to change/by pass that and obtain a proper .csv file when opened with excel ?
Five Stars

Re: tChangeFileEncoding and UTF8 encoding

I started to get a grasp on your awnser and the solution to fix my problem is to use the BOM. Unfortunately, while using tFileChangeEncoding and indicating "UTF-8-BOM", Talend can not recognize it and therefore deliver a proper output file. 

Anyone knows how to use the BOM in Talend ? Or use the custom encoding option ? 

 

*edit* 

Ok, it's not how it works. I have found this topic which is related to my problem. Apparently, I need to use a custom component in order to use BOM. BOM is not native on Talend. But maybe the previous topic is too old. I can't find the tWriteHeaderLineToFileWithBOM component. Is there a way to download it or did the OP retrieve it ? 

 

The key to my problem is the BOM. I'm sure of it. Once I can download, install and use that custom component, my problem will be solved. 

Moderator

Re: tChangeFileEncoding and UTF8 encoding

Hello,

Could you please refer to this link about:https://exchange.talend.com/#marketplaceproductoverview:marketplace=marketplace%252F1&p=marketplace%...?

And feel free to let us know if you can download this custom component from talend exchange portal.

Best regards

Sabrina

 

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Five Stars

Re: tChangeFileEncoding and UTF8 encoding

If anyone ever has the same problem, here is how I solved mine. As a reminder, I needed to change the encoding of a file.csv from ANSI to UTF-8. And I also had a problem with my UTF-8 file when I opened it with Excel.

 

First things first, it is apparently well known that excel has trouble dealing with files.csv in UTF-8. (example here). And since, the file didn't have to be used in Excel in the end, I just ignore that part. 

 

Secondly, I found that my file was not encoded in ISO-8859-15 (aka Latin-9) as I thought it was natively but in Latin-1. I tried using the option "Custom" encoding from the tFileChangeEncoding to do the job, but it was not as intuitive as I thought it would be. So I used a tJava component + a custom routine to solve this problem. For the routine, I used the java.NIO library and I found here all the encoding supported by this library. My encoding is/was "windows-1252". 

 

After that, I simply had to call my routine like : 

myPackage.MyCustomRoutine.myMethod (input_encoding, output_encoding, input_directory+input_filename, output_directory+output_filename);