[resolved] decode PDF

Four Stars

[resolved] decode PDF

I have a list of .txt files that contain encoded PDFs (base64). I am trying to decode and save them back to .pdf files. I am starting with one .txt file to test. 
tFileList ---> tFileInputDelimited -----> tJavaRow ------> tFileOutputDelimited
In tFileInputDelimited, I set row separator to something like "\nnnnnnnnnnnnnnn\nnnnnnnn" so the whole file is treated as one row
In tJavaRow, 
  byte[] buf = new sun.misc.BASE64Decoder().decodeBuffer(input_row.pdf_in);
  output_row.pdf_out = new String(buf);
but the output file test.pdf is not readable (Adobe Reader: damaged and could not be repaired). 
What am I doing wrong?

Accepted Solutions
Seventeen Stars

Re: [resolved] decode PDF

I suggest you test the extraction and decoding outside Talend in a simple Java project. If you know how to do it right, you can adapt your new knowledge in a Talend job. By the way, I would create a routine instead coding it in a tJavaRow completely. The static method from a routine could easily be developed and tested outside Talend.

All Replies
Seventeen Stars

Re: [resolved] decode PDF

I suggest you test the extraction and decoding outside Talend in a simple Java project. If you know how to do it right, you can adapt your new knowledge in a Talend job. By the way, I would create a routine instead coding it in a tJavaRow completely. The static method from a routine could easily be developed and tested outside Talend.

What’s New for Talend Spring ’19

Watch the recorded webinar!

Watch Now

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.

Download

Tutorial

Introduction to Talend Open Studio for Data Integration.

Watch

Downloads and Trials

Test drive Talend's enterprise products.

Downloads