One Star

Detect and Reject Non UTF-8 files

I have a task of detecting and rejecting all incoming xml files of Non UTF-8 format.
If my XML input file is of the form:
<?xml version="1.0" encoding="EBCDIC"?>
and the advanced settings within tFileInputXML and tFileOutputXML has UTF-8 selected, the job runs successfully whereas I want to the file to be rejected.
Output file:
<?xml version="1.0" encoding="UTF-8"?>
The file needs to be rejected even in below scenario wherein the xml version encoding is defined as UTF-8 but the data contains non UTF-8 characters(the pound symbol in the below example)
Input file:
<?xml version="1.0" encoding="UTF-8"?>
Community Manager

Re: Detect and Reject Non UTF-8 files

There is no a component or a built-in function can be used to detect the file encoding, you can refer to these discussions in this page and write a routine in Talend to parse the file encoding.
Talend | Data Agility for Modern Business