Five Stars

Help with tExtractXMLField for XHTML

I am writing a job to extract content out of word doc & .html files and load to elasticsearch. I am using tTikaExtractor to extract the contents out of the files.  I having the following components in my job. 




The process seems to work upto tRowGenerator. However tExtractXML is not fetching any data out. I have the following in the tExtractXML component

loop xpath query =   "/html/head/"

Mapping values for title/xpath query are

"title" = "/title"

"body" = "/html/body" 

Not sure how to extract creator value from <meta name="dc:creator" content="Tshak"/> in the data


Following is the output coming out of tRowGenerator

<?xml version="1.0" encoding="UTF-8"?><html xmlns="">
<meta name="date" content="2018-04-20T14:18:00Z"/>
<meta name="cp:revision" content="4"/>
<meta name="Total-Time" content="1"/>
<meta name="extended-properties:AppVersion" content="16.0000"/>
<meta name="metaSmiley Tonguearagraph-count" content="1"/>
<meta name="meta:word-count" content="11"/>
<meta name="dc:creator" content="Tshak"/>
<meta name="extended-properties:Company" content="Tshak"/>
<meta name="Word-Count" content="11"/>
<meta name="publisher" content="Tshak"/>
<meta name="metaSmiley Tongueage-count" content="1"/>
<meta name="dcSmiley Tongueublisher" content="Tshak"/>
<title>Test Extraction</title>
<body><p><b><u>Help Desk</b></u></p>
<p><a name="_GoBack"/>First paragraph content</p>
<p><b><u>Helpdesk Portal</b></u></p>
<p>Second paragraph content</p>


Appreciate your help!


Accepted Solutions
Eleven Stars

Re: Help with tExtractXMLField for XHTML

Eleven Stars

Re: Help with tExtractXMLField for XHTML

Five Stars

Re: Help with tExtractXMLField for XHTML

Thanks for your response Manohar. Your suggestion is working! I am able to extract the title and body content from the xml (xhtml).