Help with tExtractXMLField for XHTML

Six Stars

Help with tExtractXMLField for XHTML

I am writing a job to extract content out of word doc & .html files and load to elasticsearch. I am using tTikaExtractor to extract the contents out of the files.  I having the following components in my job. 

 

tFileList-->tTikaExractor-->tRowGenerator-->tExtractXML-->tFileOutputDelimited

 

The process seems to work upto tRowGenerator. However tExtractXML is not fetching any data out. I have the following in the tExtractXML component

loop xpath query =   "/html/head/"

Mapping values for title/xpath query are

"title" = "/title"

"body" = "/html/body" 

Not sure how to extract creator value from <meta name="dc:creator" content="Tshak"/> in the data

 

Following is the output coming out of tRowGenerator

<?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="date" content="2018-04-20T14:18:00Z"/>
<meta name="cp:revision" content="4"/>
<meta name="Total-Time" content="1"/>
<meta name="extended-properties:AppVersion" content="16.0000"/>
<meta name="metaSmiley Tonguearagraph-count" content="1"/>
<meta name="meta:word-count" content="11"/>
<meta name="dc:creator" content="Tshak"/>
<meta name="extended-properties:Company" content="Tshak"/>
<meta name="Word-Count" content="11"/>
<meta name="publisher" content="Tshak"/>
<meta name="metaSmiley Tongueage-count" content="1"/>
<meta name="dcSmiley Tongueublisher" content="Tshak"/>
<title>Test Extraction</title>
</head>
<body><p><b><u>Help Desk</b></u></p>
<p><a name="_GoBack"/>First paragraph content</p>
<p/>
<p><b><u>Helpdesk Portal</b></u></p>
<p>Second paragraph content</p>
<p/>
<p/>
</body></html>

 

Appreciate your help!


Accepted Solutions
Thirteen Stars

Re: Help with tExtractXMLField for XHTML


All Replies
Thirteen Stars

Re: Help with tExtractXMLField for XHTML

Six Stars

Re: Help with tExtractXMLField for XHTML

Thanks for your response Manohar. Your suggestion is working! I am able to extract the title and body content from the xml (xhtml).

 

 

 

 

Tutorial

Introduction to Talend Open Studio for Data Integration.

Definitive Guide to Data Integration

Practical steps to developing your data integration strategy.

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.