Six Stars skh
Six Stars

How to scrap the Website content and check for the url existance

Hi Guys,

I came across a scenario where I need to search for the availability of url's (PPC ad) within a website using Talend. I have used tHttpRequest Component to fetch the contents of Website and was able to get the html information into the flatfile. Here I need to check out the corresponding urls availability in the flatfile.

I am using Talend Open Studio 6.3 Version, how can I achieve this scenario.

 

Thanks,

skh.

 

 

 

 

 

2 REPLIES
Seven Stars

Re: How to scrap the Website content and check for the url existance

Hello,

HTML is a specific version of XML so use XML components in Talend to filter required information. E.g tXMLMap or tExtractXMLField.

Regards
Lojdr
Six Stars skh
Six Stars

Re: How to scrap the Website content and check for the url existance

Hi ,

 

But in my scenario am scrapping entire website code which is in html-format and loading into the flatfile.

I think html is differ from XML, I will check with the xml components and let you know.

 

Thanks,

skh