Trying build a data flow that reads a PDF from the local server and load the file into the database using Talend.
Data type of the column that I am loading is BLOB
If you're literally just wanting to store the PDF file binary data in a database BLOB field, then this can be done very simply, as follows:
Use a tFileInputRaw, with the Mode set to "Read the file as a bytes array":
Then, in the schema of e.g. a tMysqlOutput component, set the DB Type to BLOB:
If however you're wanting to read in the PDF and do some processing, e.g. extracting the text, then you will need to do this in Java code using a suitable library such a iText. Be aware that iText, whilst a superb and very feature rich library, is not free for any commercial use, and so you'd need to buy a licence.
I did take a quick look on Talend Exchange, and found a free component - tTikaExtractor - which appears to offer extraction of text from PDF files, so this may be an option, although I've not used this.
Watch the recorded webinar!
Accelerate your data lake projects with an agile approach
Create systems and workflow to manage clean data ingestion and data transformation.
Introduction to Talend Open Studio for Data Integration.