Detect and Reject Non UTF-8 files

One Star

Detect and Reject Non UTF-8 files

I have a task of detecting and rejecting all incoming xml files of Non UTF-8 format.
If my XML input file is of the form:
<?xml version="1.0" encoding="EBCDIC"?>
<book>
<price>50£</price>
</book>
and the advanced settings within tFileInputXML and tFileOutputXML has UTF-8 selected, the job runs successfully whereas I want to the file to be rejected.
Output file:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<row>
<price>50</price>
</row>
</root>
The file needs to be rejected even in below scenario wherein the xml version encoding is defined as UTF-8 but the data contains non UTF-8 characters(the pound symbol in the below example)
Input file:
<?xml version="1.0" encoding="UTF-8"?>
<book>
<price>50£</price>
</book>
Community Manager

Re: Detect and Reject Non UTF-8 files

Hi
There is no a component or a built-in function can be used to detect the file encoding, you can refer to these discussions in this page and write a routine in Talend to parse the file encoding.
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business