One Star

[Resolved] How to use tTikaExtractor ?

Hello!
I'm trying to use tTikaExtractor to parse some word files.
But I have no idea what component I should use for the output. When I try with a fixedflowinput I cannot connect it.
Any help ?
Thanks a lot !
6 REPLIES
Moderator

Re: [Resolved] How to use tTikaExtractor ?

Hi,
Do you want to parse HTML?
Have you tried to use tTikaExtractor -> tFixedFlowInput -> tFileOutputDelimited?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: [Resolved] How to use tTikaExtractor ?

Thanks for your reply,
No I'm trying to parse .docx files.
When I try to use tFixedFlowInput, I canot even make the link between the 2 components. Should I change something in the tFixedFlow Input ?
What should be the shema for example ?
Thanks !
Moderator

Re: [Resolved] How to use tTikaExtractor ?

Hi,
Have you already checked component introduction about TalendExchange:tTikaExtractor?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: [Resolved] How to use tTikaExtractor ?

Yes, I have already checked the component description, for example I would like to use the CONTENT_XHTML property, how can I define this in the tFixedFlowInput ?
Edit :
For example, I created this job :

What is the configuration of the FixedFlowInput ?


I can't figure out how to configure this
Any help ? Thanks !
One Star

Re: [Resolved] How to use tTikaExtractor ?

Ok, I found how to do it, maybe it will be uselfull for someone else.
How to get data from tTikaExctrator in a tRowGenerator component :
One Star

Re: [Resolved] How to use tTikaExtractor ?

Hi,
Tika extractor is a very powerfull component for pdf extraction and doc also. I recently downloaded the 1.11 version from  apache, put il in the ttika folder and just change the reference to it on tTikaExtractor_java.xml in the section :
<CODEGENERATION>
    <IMPORTS>
      <IMPORT
        NAME="tika"
        MODULE="tika-app-1.11.jar"
Requires java 1.7