Parsing html for <table>-Data

One Star

Parsing html for <table>-Data

Hi there,
I want to parse a html-file and want to look out for a table for the data inside it.
At first I looked for some component, which takes html-Tables as an input and found the tHTTPTableInput: http://www.talendforge.org/exchange/tos/extension_view.php?eid=72 . But I did not manage to get it running under Talend Open Studio 4.2 . I have extracted the directory into "Talend\plugins\org.talend.designer.components.localprovider_4.2.0.RC2_r58358\components", but Talend sends no signal in its logfile or anywhere, that it loads the component successfully. In the logfile there is no error, but the component does not show up either.
A second try was to use tHTTPRequest as a source for tFileInputXML, but when I am trying to create a schema for the html-File, Talend is saying that the input source is no valid XML...
Does anybody have some experience in parsing html-Tables?
Employee

Re: Parsing html for <table>-Data

Knutsen,
What you can try is to put that tHTTPTableInput into a specific folder, ie. myComponents.
Then:
In Windows/Preferences, under Talend/Components, select the myComponents folder as User Component Folder
Click on Apply, then Ok.
You should now be able to see that component in your palette.
One Star

Re: Parsing html for <table>-Data

Thank you very much, that totally worked. :-)
One Star

Re: Parsing html for <table>-Data

I have downloaded and installed the tHTTPTableInput component and connected this component to an excel output file. However, when I run the component for the default URL http://weather.noaa.gov/weather/current/EDDN.html it does not return any data. It writes one row to the output excel file which is just the schema that I had defined in the tHTTPTableInput component. I am on Talend 4.2, any thoughts?
Thanks
Pat
One Star

Re: Parsing html for <table>-Data

I have downloaded and installed the tHTTPTableInput component and connected this component to an excel output file. However, when I run the component for the default URL http://weather.noaa.gov/weather/current/EDDN.html it does not return any data. It writes one row to the output excel file which is just the schema that I had defined in the tHTTPTableInput component. I am on Talend 4.2, any thoughts?
Thanks
Pat

I am having the same issue , please let me know the procedure to solve it or any way to get the content of html table