scrape html source and convert to xml file

One Star

scrape html source and convert to xml file

Hi,

I want to reference a URL, a https url, and then viewing the source code associated with that webpage - parse certain values into a new XML file. Is there any component suitable for grabbing values from a html source doco?

I've looked at automation anywhere and that seems to have issue with the https part of the url.

no standard html grabber comes with talend by default, I guess there might be a custom component out there someplace. I nthe meantime I'll try and use "tHTTPTableInput" though this sems to be looking for a table and the data I'm after does not necessarily reside in a table.

e.g. I'm after the Dataset Score below in the HTML.

Thanks for having a scope at the post.

Cheers, Ed

<tr>
<td><a href="/dq_processes/SALESFORCE%20Clinical%20Research/datasets/315" title="View">SALESFORCE John Smith Contacts</a></td>
<td align="center">
<div class="graph" title="Dataset Score: 90%">
<strong class="bar" style="width: 90%;" >
<span>90%</span></strong>
</div>
</td>
<td>

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Have you checked out Talend’s 2019 Summer release yet?

Find out about Talend's 2019 Summer release

Blog

Talend Summer 2019 – What’s New?

Talend continues to revolutionize how businesses leverage speed and manage scale

Watch Now

6 Ways to Start Utilizing Machine Learning with Amazon We Services and Talend

Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend

Blog