How to scrap the Website content and check for the url existance

Six Stars skh
Six Stars

How to scrap the Website content and check for the url existance

Hi Guys,

I came across a scenario where I need to search for the availability of url's (PPC ad) within a website using Talend. I have used tHttpRequest Component to fetch the contents of Website and was able to get the html information into the flatfile. Here I need to check out the corresponding urls availability in the flatfile.

I am using Talend Open Studio 6.3 Version, how can I achieve this scenario.

 

Thanks,

skh.

 

 

 

 

 


Accepted Solutions
Six Stars skh
Six Stars

Re: How to scrap the Website content and check for the url existance

 

I used tHttpRequest Component to scrawl the code of the website, later used Java Code to check the required url existence.

Thanks,

Hameed.

 


All Replies
Eight Stars

Re: How to scrap the Website content and check for the url existance

Hello,

HTML is a specific version of XML so use XML components in Talend to filter required information. E.g tXMLMap or tExtractXMLField.

Regards
Lojdr
Six Stars skh
Six Stars

Re: How to scrap the Website content and check for the url existance

Hi ,

 

But in my scenario am scrapping entire website code which is in html-format and loading into the flatfile.

I think html is differ from XML, I will check with the xml components and let you know.

 

Thanks,

skh

Six Stars skh
Six Stars

Re: How to scrap the Website content and check for the url existance

 

I used tHttpRequest Component to scrawl the code of the website, later used Java Code to check the required url existence.

Thanks,

Hameed.