Web Scraping

One Star

Web Scraping

Hi everyone,
I have the URL of a web page. In this, there are some links. For each link, I have to scrape all its content.
I want to make it with TOS. It's the first time that I make something like that.
Have I need to use a script, for example in Python, to combine with a talend job? Or can I do everything through specific talend components (so without scripts)? Which components have I to use?
Thanks all
Community Manager

Re: Web Scraping

Hello 
Take a look at tHttpRequest component, this component can be used to send a http request to the serve and get the page content from the URL, and then use regular expression or tExtractXMLFields component to extract all links from the response, finally, iterate link one by one. For example:
tHttpRequest--main--tExtractXMLField-main-tFlowToIterate--iterate--tHttpRequest--main--tLogRow
Best regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business