One Star

html to xml

Is there any component or way to convert html files to xml document?
I am nore interested in body and title tags. Everything in <body> of html can stay in <body> tag of xml
Cheers.
10 REPLIES
Community Manager

Re: html to xml

Hi
There is no component can be used to convert html file to xml file directly, you have to extract records from html file and then insert them into xml file.
Consider the following job design to extract desired records from html file:
tFileInputFullRow--main-->tFilterRow-->tExtractRegexFields.
tFileInputFullRow: read each row of html file one by one
tFilterRow: filter the desired row, for example: row startsWith <body>
tExtractRegexFields: use regular expresstion to extract fields
Best regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: html to xml

Thanks shong for your reply. I tried it. However, I get this "advanced condition failed" error from tFileInputFullRow.
Log output:
connecting to socket on port 3654
connected
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">|advanced condition failed
<head>|advanced condition failed
<meta http-equiv="content-type" content="text/html;" />|advanced condition failed
<title>Partnerships</title>|advanced condition failed
</head>|advanced condition failed
<body><h1 class="entry-title" style="margin-bottom:25px;">Partnerships</h1>|advanced condition failed
.....
....
Community Manager

Re: html to xml

Hi
i tested to read a html file using tFileInputFullRow and I don't have any problem, can you please send me an example file for testing.
Best regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: html to xml

Thanks..I restarted it and it works fine.
However, when I try input_row.htmlstring.startsWith("<body>") in tfilterrow component then I see only first line. Seems that it breaks when there is new line within body tag. How can I solve this?
One Star

Re: html to xml

Is there any component or way to convert xml or csv or from database to html files
Regards
Kishore
One Star

Re: html to xml

if you want to convert an xml file to an html one, you need just to use an xsl transformation.
you may use this model as a transformation job
tFileOutputXML -----> tFileList (in case you want to do this for a group of files) ----> tXSLT
this works, i already tried it.
One Star

Re: html to xml

Another way to go from XML -> HTML, CSV, and PDF is to use a Jasper Report. This video shows how to use the Jasper Report IDE "iReport" to build a report off of an XML document. The iReport product can be called from a Talend component.
http://youtu.be/Y_JMUv7GiK8
One Star

Re: html to xml

Thank you friends its working
Regards
Kishore
One Star

Re: html to xml

Hi friends,
New job please tell me how to extract data from HTML files in4.2 version.
regards,
Kishore
One Star

Re: html to xml

Hi Friends,
Plz let me know the process for extracting data from HTML to XML. CSV or any file formate.
Regards,
Kishore