tFileInputDelimited encoding issue with "en dash"

Five Stars

tFileInputDelimited encoding issue with "en dash"

Hello


I have a txt file with usage of what appears to be "en dash"

 

However, in talend, I can't seem to get it to recognize as it just shows up as junk character.

 

I found this older post 

https://www.talendforge.org/forum/viewtopic.php?id=38535 ;

but even after trying to change the encoding type for my tFileInputDelimited component from "US-ASCII" to "UTF-8"; the tLogRow output still displays as un-readable; (was trying to focus on Talend etl reading before moving on to MSSQL database setup). 

 

–  (en dash)
-  (hyphen)

 

 

Thanks

 

 


Accepted Solutions
Five Stars

Re: tFileInputDelimited encoding issue with "en dash"

For anyone else in future needing to address: Solution with help from Talend Support

 

I was not using the correct encoding (duh) so:

Change encoding to "Custom" with value "Windows-1252'"


Thanks.


All Replies
Moderator

Re: tFileInputDelimited encoding issue with "en dash"

Hello,

Could you please try to create file delimited metadata and get another encoding for your input file to see if it works? Would you mind posting some sample content of your txt file with  "en dash"?

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Five Stars

Re: tFileInputDelimited encoding issue with "en dash"

Hi Sabrina

 

Yes,I have already tried to change the encoding of the adv setting of the metadata (tFileInputdelimited)

Studio also has configuration set Talend> Specific Settings [chkbox] allow specific characters for .....

 

I've tried encoding from US-ASCII to UTF-8 and ISO-8859-15 ( to no avail.)

US-ASCII/UTF-8 shows question mark

ISO-8859-15; blank space

 

Below is example of text not reading into Talend:

"Polar Ice – Gum"   en dash  (Talend reads as "Polar Ice � Gum")
"Polar Ice - Gum"     hypen   (Talend reads fine)

 

- hyphen
– N (en dash)
— M (em dash)

 

Thanks

Five Stars

Re: tFileInputDelimited encoding issue with "en dash"

For anyone else in future needing to address: Solution with help from Talend Support

 

I was not using the correct encoding (duh) so:

Change encoding to "Custom" with value "Windows-1252'"


Thanks.

Moderator

Re: tFileInputDelimited encoding issue with "en dash"

Hello,

Thanks for sharing your solution with us.

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.