One Star

Read local HTML file

Hi!
I should parse and save data from .xls files. But I can't use standart component tfileInputExcel, because these files are in html format. I opened file with "notepad" and it conteined html tags
".c155 {border-width: 0.5pt;border-color: #000000;border-style: solid;width:2.086%;background-color: #ffffff;}
.c156 {border-width: 0.5pt;border-color: #000000;border-style: solid;width:2.045%;background-color: #ffffff;}
.c157 {border-width: 0.5pt;border-color: #000000;border-style: solid;width:2.139%;background-color: #ffffff;}
.c158 {margin-top: 0.0pt;margin-bottom: 0.0pt;margin-left: 0.6pt;margin-right: auto;width: 1589.15pt;border-collapse: collapse;}
</style>
</head>
<body>
<table class="c10">
<tr class="c0">
<td valign="top" class="c1"><p class="c2"><br/></p>
<p class="c2"><br/></p>
</td>
<td valign="top" class="c3"><p class="c4"><span class="c5">DATA</span></p>
<p class="c6"><br/></p>..."

I've found how to read html-tables from web-site ("http://www.rilhia.com/tutorials/using-third-party-java-library-scrape-content-table-web-page"). But I should work with local file. 
Please, help me!
Thanks!
2 REPLIES
Moderator

Re: Read local HTML file

Hi,
Have you tried to create file xml in metadata to read your xml files? What does your expected output look like?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Read local HTML file

Hi!


I use tHTTPTableInput component to solve my problem.


tFileInputFullRow------------>tFileOutputDelimited
        |


        | onSubjobOk


        \/


tHTTPTableInput------------>tLogRow


I read my .xls file (tFileInputFullRow) into "D:/tmp/DailyStat.html" (tFileOutputDelimited)


And tHTTPTableInput reads from URL ""file://localhost/D:/tmp/DailyStat.html"" with "Syntax for Table : T=1" my html table
Thanks!