tChangeFileEncoding and UTF8 encoding

Six Stars

tChangeFileEncoding and UTF8 encoding

Hello, 

 

I have an input_file encoded in ANSI that I want to encode to UTF-8.

 

So basically, I use the tChangeFileEncoding component and I do get an output_file encoded in UTF-8. While I open it with notepad++, everything is alright. 

But when i open it with Excel, "€" and "é" caracters show me things like "€_" and "é". 

 

Is there any way to fix this ? 


Accepted Solutions
Six Stars

Re: tChangeFileEncoding and UTF8 encoding

If anyone ever has the same problem, here is how I solved mine. As a reminder, I needed to change the encoding of a file.csv from ANSI to UTF-8. And I also had a problem with my UTF-8 file when I opened it with Excel.

 

First things first, it is apparently well known that excel has trouble dealing with files.csv in UTF-8. (example here). And since, the file didn't have to be used in Excel in the end, I just ignore that part. 

 

Secondly, I found that my file was not encoded in ISO-8859-15 (aka Latin-9) as I thought it was natively but in Latin-1. I tried using the option "Custom" encoding from the tFileChangeEncoding to do the job, but it was not as intuitive as I thought it would be. So I used a tJava component + a custom routine to solve this problem. For the routine, I used the java.NIO library and I found here all the encoding supported by this library. My encoding is/was "windows-1252". 

 

After that, I simply had to call my routine like : 

myPackage.MyCustomRoutine.myMethod (input_encoding, output_encoding, input_directory+input_filename, output_directory+output_filename);

 

 

 


All Replies
Forteen Stars

Re: tChangeFileEncoding and UTF8 encoding

@JoshyBrown,what type of file are you using to change the coding?

Manohar B
Don't forget to give kudos/accept the solution when a replay is helpful.
Six Stars

Re: tChangeFileEncoding and UTF8 encoding

It's a .csv file. 

Forteen Stars

Re: tChangeFileEncoding and UTF8 encoding

@JoshyBrown,based on the encoding those characters will be converted into special characters.

Manohar B
Don't forget to give kudos/accept the solution when a replay is helpful.
Six Stars

Re: tChangeFileEncoding and UTF8 encoding

@manodwhb, Is there a way to change/by pass that and obtain a proper .csv file when opened with excel ?
Six Stars

Re: tChangeFileEncoding and UTF8 encoding

I started to get a grasp on your awnser and the solution to fix my problem is to use the BOM. Unfortunately, while using tFileChangeEncoding and indicating "UTF-8-BOM", Talend can not recognize it and therefore deliver a proper output file. 

Anyone knows how to use the BOM in Talend ? Or use the custom encoding option ? 

 

*edit* 

Ok, it's not how it works. I have found this topic which is related to my problem. Apparently, I need to use a custom component in order to use BOM. BOM is not native on Talend. But maybe the previous topic is too old. I can't find the tWriteHeaderLineToFileWithBOM component. Is there a way to download it or did the OP retrieve it ? 

 

The key to my problem is the BOM. I'm sure of it. Once I can download, install and use that custom component, my problem will be solved. 

Moderator

Re: tChangeFileEncoding and UTF8 encoding

Hello,

Could you please refer to this link about:https://exchange.talend.com/#marketplaceproductoverview:marketplace=marketplace%252F1&p=marketplace%...?

And feel free to let us know if you can download this custom component from talend exchange portal.

Best regards

Sabrina

 

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Six Stars

Re: tChangeFileEncoding and UTF8 encoding

If anyone ever has the same problem, here is how I solved mine. As a reminder, I needed to change the encoding of a file.csv from ANSI to UTF-8. And I also had a problem with my UTF-8 file when I opened it with Excel.

 

First things first, it is apparently well known that excel has trouble dealing with files.csv in UTF-8. (example here). And since, the file didn't have to be used in Excel in the end, I just ignore that part. 

 

Secondly, I found that my file was not encoded in ISO-8859-15 (aka Latin-9) as I thought it was natively but in Latin-1. I tried using the option "Custom" encoding from the tFileChangeEncoding to do the job, but it was not as intuitive as I thought it would be. So I used a tJava component + a custom routine to solve this problem. For the routine, I used the java.NIO library and I found here all the encoding supported by this library. My encoding is/was "windows-1252". 

 

After that, I simply had to call my routine like : 

myPackage.MyCustomRoutine.myMethod (input_encoding, output_encoding, input_directory+input_filename, output_directory+output_filename);

 

 

 

Five Stars

Re: tChangeFileEncoding and UTF8 encoding

Hello Joshy,

 

Please, can you share your routine please ?

Thanks you !!

Six Stars

Re: tChangeFileEncoding and UTF8 encoding

Hello @sasafca , 

 

You'll find it in the join piece to this message. 

Hoping it will help.

 

 

Five Stars

Re: tChangeFileEncoding and UTF8 encoding

Hello, @JoshyBrown 

You are a Genius !!

 

Thanks you so much !

Six Stars

Re: tChangeFileEncoding and UTF8 encoding

I'll take the compliment even though this problem gave me quite a hard time Smiley Very Happy

Highlighted
Five Stars

Re: tChangeFileEncoding and UTF8 encoding

@JoshyBrown, do you know how to convert CSV UTF-8 to UTF-8-BOM please ?

 

My client need a CSV in UTF-8... with BOM for special caraters... (not "windows-1252" regrettably)

 

Thanks you Smiley Happy

Six Stars

Re: tChangeFileEncoding and UTF8 encoding

Sorry, that I don't know. I didn't have to go through this problem.

What’s New for Talend Spring ’19

Join us live for a sneak peek!

Sign up now

Agile Data lakes & Analytics

Accelerate your data lake projects with an agile approach

Watch

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.

Download

Tutorial

Introduction to Talend Open Studio for Data Integration.

Watch