How to load the HTML data into tables

Four Stars

How to load the HTML data into tables

Hi Team, 

can you help me how to load HTML data into tables by using Talend? 

I have attached sample HTML file. 

Regards

Jay

 


Accepted Solutions
Highlighted
Eight Stars

Re: How to load the HTML data into tables

Hello Jayrapolu,

 

Generally, HTML is a subset of XML therefore use XML components.

First thing, the file you attached is not a valid HTML file. There are missing some tags (e.g. <HTML></HTML>), some tags are not closed (e.g. <BODY>)... You have not specified what should be the output format and some other important conditions, so it is hard to provide you the exact answer, but...

 

The most important component is tXMLMap I think. See the attached screenshot (sorry for the naming convention). If we take only the important part of the HTML you provided:

<body>
<table cellpadding="0" cellspacing="0" border="0" width="100%">
				<tr>
					<td width="186" class="headlabel">CONSUMER:</td>
					<td width="320" class="headvalue">Jay</td>
					<td width="73"><img src="images/spacer.gif" /></td>
					<td width="118" class="headlabel">DATE:</td>
					<td width="128" class="headvalue">17-10-2017</td>
				</tr>
				<tr>
					<td class="headlabel">MEMBER ID:</td>
					<td class="headvalue">AA40238899_C2C1               </td>
					<td><img src="images/spacer.gif" /></td>
					<td class="headlabel">TIME:</td>
					<td class="headvalue">12:32:54</td>
				</tr>
</table>
</body>

You can use the following job to extract headlabels and headvalues.
snip.PNG

I also attached an export of the job. 

 

I hope, that this will help you to solve this task.

 

Best regards

lojdr

 


All Replies
Highlighted
Eight Stars

Re: How to load the HTML data into tables

Hello Jayrapolu,

 

Generally, HTML is a subset of XML therefore use XML components.

First thing, the file you attached is not a valid HTML file. There are missing some tags (e.g. <HTML></HTML>), some tags are not closed (e.g. <BODY>)... You have not specified what should be the output format and some other important conditions, so it is hard to provide you the exact answer, but...

 

The most important component is tXMLMap I think. See the attached screenshot (sorry for the naming convention). If we take only the important part of the HTML you provided:

<body>
<table cellpadding="0" cellspacing="0" border="0" width="100%">
				<tr>
					<td width="186" class="headlabel">CONSUMER:</td>
					<td width="320" class="headvalue">Jay</td>
					<td width="73"><img src="images/spacer.gif" /></td>
					<td width="118" class="headlabel">DATE:</td>
					<td width="128" class="headvalue">17-10-2017</td>
				</tr>
				<tr>
					<td class="headlabel">MEMBER ID:</td>
					<td class="headvalue">AA40238899_C2C1               </td>
					<td><img src="images/spacer.gif" /></td>
					<td class="headlabel">TIME:</td>
					<td class="headvalue">12:32:54</td>
				</tr>
</table>
</body>

You can use the following job to extract headlabels and headvalues.
snip.PNG

I also attached an export of the job. 

 

I hope, that this will help you to solve this task.

 

Best regards

lojdr

 

Four Stars

Re: How to load the HTML data into tables

Thanks for the solution. Very much appreciated. 

 

Regards
Jay

15TH OCTOBER, COUNTY HALL, LONDON

Join us at the Community Lounge.

Register Now

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

Agile Data lakes & Analytics

Accelerate your data lake projects with an agile approach

Watch