Talend Connect
Virtual Summit
JOIN US!
And visit the Customer
& Community Lounge.
May 27-28, wherever you are.

Read local HTML file

Highlighted
One Star

Read local HTML file

Hi!
I should parse and save data from .xls files. But I can't use standart component tfileInputExcel, because these files are in html format. I opened file with "notepad" and it conteined html tags
".c155 {border-width: 0.5pt;border-color: #000000;border-style: solid;width:2.086%;background-color: #ffffff;}
.c156 {border-width: 0.5pt;border-color: #000000;border-style: solid;width:2.045%;background-color: #ffffff;}
.c157 {border-width: 0.5pt;border-color: #000000;border-style: solid;width:2.139%;background-color: #ffffff;}
.c158 {margin-top: 0.0pt;margin-bottom: 0.0pt;margin-left: 0.6pt;margin-right: auto;width: 1589.15pt;border-collapse: collapse;}
</style>
</head>
<body>
<table class="c10">
<tr class="c0">
<td valign="top" class="c1"><p class="c2"><br/></p>
<p class="c2"><br/></p>
</td>
<td valign="top" class="c3"><p class="c4"><span class="c5">DATA</span></p>
<p class="c6"><br/></p>..."

I've found how to read html-tables from web-site ("http://www.rilhia.com/tutorials/using-third-party-java-library-scrape-content-table-web-page"). But I should work with local file. 
Please, help me!
Thanks!
Highlighted
Moderator

Re: Read local HTML file

Hi,
Have you tried to create file xml in metadata to read your xml files? What does your expected output look like?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Highlighted
One Star

Re: Read local HTML file

Hi!


I use tHTTPTableInput component to solve my problem.


tFileInputFullRow------------>tFileOutputDelimited
        |


        | onSubjobOk


        \/


tHTTPTableInput------------>tLogRow


I read my .xls file (tFileInputFullRow) into "D:/tmp/DailyStat.html" (tFileOutputDelimited)


And tHTTPTableInput reads from URL ""file://localhost/D:/tmp/DailyStat.html"" with "Syntax for Table : T=1" my html table
Thanks!

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Best Practices for Using Context Variables with Talend – Part 2

Part 2 of a series on Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog