One Star

[resolved] tHttpRequest - how to retrieve html content from URL

Hello,
I'm trying to get the HTML content from URL's that I have in a CSV File with the tHttpRequest Component.
My CSV FIle looks like this:
column1
http://www.example.com/item-url-1.html

http://www.example.com/item-url-2.html
http://www.example.com/item-url-3.html
Attached you see 2 screenshots with the job and the settings of the component in talend.
The problem is that I get as result the same URLs and not the HTML from the URLs.
Can anyone tell me what am I missing?

Job done:
Basic settings in Talend:

Thanks,
Lucian
1 ACCEPTED SOLUTION

Accepted Solutions
One Star

Re: [resolved] tHttpRequest - how to retrieve html content from URL

Hi Shong,
the append option box solved the problem. I can't stress it enough how grateful I am to you!
I get now the desired html content for each row but the rows are not assigned to my "sku" column from input file because like you said it's read only
Is there a way I can add the "sku" column from the input csv as a key column on output for each extracted url?
Thank you,
Lucian
14 REPLIES
Community Manager

Re: [resolved] tHttpRequest - how to retrieve html content from URL

Hi 
You need to iterate each url read from the source file, and set the URL filed of tHttpRequest with a dynamic variable. For example:
tFileInputDelimied-main(row1)--tFlowToIterate-iterate-tHttpRequest--main-tLogRow
on tHttpRequest, set the URIl field as:
(String)globalMap.get("row1.url")
//url is the column name on tFileInputDelimited.
BR
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: [resolved] tHttpRequest - how to retrieve html content from URL

Hi Shong,
thank you for your help.
Here is what I done:
Pic1 tHttpRequest:
Pic2 Error:

Pic3-tFlowToIterate

I get a 404 Not Found Error.
I have 2 columns in my csv file:
"sku"
and
"description_long" witch contains the url's.
In tFlowToIterate component I declared the variable "Dnl_Descr_Url" for "description_long" column
Than in tHttpRequest I set URI like: 
"http://localhost/url/test-html-url.csv"+((String)globalMap.get("row35.description_long"))
and also like:
"http://localhost/url/test-html-url.csv"+((String)globalMap.get("row35.Dnl_Descr_Url"))
But I still get "404 Not Found"
Im a newbie on this territory. I'll appreciate your help very much.
Thank you,
Lucian
Community Manager

Re: [resolved] tHttpRequest - how to retrieve html content from URL

Hi
Set the URI with global variable as:
((String)globalMap.get("row35.Dnl_Descr_Url"))
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: [resolved] tHttpRequest - how to retrieve html content from URL

Hello,
I tried:
"http://localhost/url/test-html-url.csv"+((String)globalMap.get("row35.Dnl_Descr_Url"))
and
((String)globalMap.get("row35.Dnl_Descr_Url"))
and I still get the same error.
If I check the schema in tHttpRequest I see only the ResponseContent column. Shouldn't be there also the "description_long" column?
Thank you,
Lucian
One Star

Re: [resolved] tHttpRequest - how to retrieve html content from URL

Hi Shong,
can you please check if Im doing everything as you mentioned ? I still get that "404 Not Found" error.
Thank you, 
Lucian
Community Manager

Re: [resolved] tHttpRequest - how to retrieve html content from URL

Hi 
If I check the schema in tHttpRequest I see only the ResponseContent column. Shouldn't be there also the "description_long" column?

This component has only one column which is read-only. 
Can you open the URL in browser normally?  If you still have problem, can you please show us an real example data of your CSV file and upload a full screenshot of tFlowToIterate component? From your screenshot, I can't see if the 'the default key/value...' box is checked or not. 
Best regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: [resolved] tHttpRequest - how to retrieve html content from URL

Hello Shong,
thank you very much for your answer. Attachend you will find my csv file with the urls. The URLS are working fine when they are opened in browser.
The default/value checkbox wasnt checkek in my first test. I made now 2 tests more with and without the default/key value checked and I still get the same error.
Here are the Screenshots:
test-html-url.rar.rar
1.0 tFlowToIterate - default key/value not checked
1.1Error with tFlowToIterate - default key/value not checked
2.0 tFlowToIterate - default key/value checked

2.1 Error with tFlowToIterate - default key/value checked

Best Regards,
Lucian
Community Manager

Re: [resolved] tHttpRequest - how to retrieve html content from URL

Hi  
I tested your example URL and it works fine, it uses get method on tHttpRequest to send the request. see my screenshots.

BR
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: [resolved] tHttpRequest - how to retrieve html content from URL

Hi Shong,
many thanks for your help. 
The job is running perfectly now but in output it will save only the last row. Im my example we had 2 rows and only the last row is being written in output. To be sure of it I tested a new CSV file reading the first 90 rows from it with tSampleRow Component and I get only the last one in output, in this case row 91 because the first is the header.
here is my screenshot:
90 rows executed but only 1 written.
Many Thanks,
Lucian
Highlighted
Community Manager

Re: [resolved] tHttpRequest - how to retrieve html content from URL

Hi 
Check the 'append' option on tFileOutputDelimited to append the data, otherwise, it will create a new file for each iteration.
BR
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: [resolved] tHttpRequest - how to retrieve html content from URL

Hi Shong,
the append option box solved the problem. I can't stress it enough how grateful I am to you!
I get now the desired html content for each row but the rows are not assigned to my "sku" column from input file because like you said it's read only
Is there a way I can add the "sku" column from the input csv as a key column on output for each extracted url?
Thank you,
Lucian
Community Manager

Re: [resolved] tHttpRequest - how to retrieve html content from URL

The sku column can be accessible with this expression in your example:
(String)globalMap.get("row35.sku")

So, add a tMap after tHttpRequest, add a new column called "sku" in the output table and set its values as:
(String)globalMap.get("row35.sku")
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: [resolved] tHttpRequest - how to retrieve html content from URL

Hi,
thank you Shong. That solved the problem.
Regards,
Lucian
Community Manager

Re: [resolved] tHttpRequest - how to retrieve html content from URL

great, thanks for your feedback!
BR
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business