Trying build a data flow that reads a PDF from the local server and load the file into the database using Talend.
Data type of the column that I am loading is BLOB
If you're literally just wanting to store the PDF file binary data in a database BLOB field, then this can be done very simply, as follows:
Use a tFileInputRaw, with the Mode set to "Read the file as a bytes array":
Then, in the schema of e.g. a tMysqlOutput component, set the DB Type to BLOB:
If however you're wanting to read in the PDF and do some processing, e.g. extracting the text, then you will need to do this in Java code using a suitable library such a iText. Be aware that iText, whilst a superb and very feature rich library, is not free for any commercial use, and so you'd need to buy a licence.
I did take a quick look on Talend Exchange, and found a free component - tTikaExtractor - which appears to offer extraction of text from PDF files, so this may be an option, although I've not used this.