Four Stars

[resolved] decode PDF

I have a list of .txt files that contain encoded PDFs (base64). I am trying to decode and save them back to .pdf files. I am starting with one .txt file to test. 
tFileList ---> tFileInputDelimited -----> tJavaRow ------> tFileOutputDelimited
In tFileInputDelimited, I set row separator to something like "\nnnnnnnnnnnnnnn\nnnnnnnn" so the whole file is treated as one row
In tJavaRow, 
  byte[] buf = new sun.misc.BASE64Decoder().decodeBuffer(input_row.pdf_in);
  output_row.pdf_out = new String(buf);
but the output file test.pdf is not readable (Adobe Reader: damaged and could not be repaired). 
What am I doing wrong?
1 ACCEPTED SOLUTION

Accepted Solutions
Seventeen Stars

Re: [resolved] decode PDF

I suggest you test the extraction and decoding outside Talend in a simple Java project. If you know how to do it right, you can adapt your new knowledge in a Talend job. By the way, I would create a routine instead coding it in a tJavaRow completely. The static method from a routine could easily be developed and tested outside Talend.
1 REPLY
Seventeen Stars

Re: [resolved] decode PDF

I suggest you test the extraction and decoding outside Talend in a simple Java project. If you know how to do it right, you can adapt your new knowledge in a Talend job. By the way, I would create a routine instead coding it in a tJavaRow completely. The static method from a routine could easily be developed and tested outside Talend.