I have a requirement to read one table the data from a PDF file/files. wanted to know like do we have any component provided by talend tool through which we can read the content from the pdf files.
I have gone through the different posts on google but maximum I found that it can be done using a piece of java code, but issue is that it is customized for a particular file and not valid unanimously for any kind of PDF file.
I attach an example of my pdf, but I have a lot of pdf that I download from the following site https://www.cert.ssi.gouv.fr/ , can someone help me how I can extrapolate the table of each pdf and then I integrate it into a file
Unfortunately, there is no a component can be used to extract data from a PDF file in talend.
You could create a custom routine( hard code) to read it by yourself.
Hi @xdshi ,
I am trying to use routine in Talend OS to read pdf and store the data in excel.
Actually, my PDF has a format like below:
-- page 1
-- page 2
I am struggling to read this pdf and save the table in excel for further use.
Thanks in advance.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Find out about Talend's 2019 Summer release
Talend continues to revolutionize how businesses leverage speed and manage scale
Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend