Recover data stream in talend

One Star

Recover data stream in talend

Hello,
I would like to recover a data stream in talend. The data are sent to me on a URL. But I would have probably dozens of files to retrieve at a time in my application talend, via this URL.
In this case the best solution is to use the component it tsocketinput?
If so, may have to listen on multiple ports simultaneously?
The idea is to retrieve data directly in talend and not go through a php application ...
Employee

Re: Recover data stream in talend

I would try tFileFetch. You can run mutltiple fetches in parallel with enterprise edition by selecting parallel execution on the jobs detail tab and then not connecting the subjobs. Not sure if you can do it out of the box with the open source edition.
With enterprise edition you can also use Parallel and Sync components. They are a good match for this use case.
Parallel |--- tFileFetch (file A)
| |
| |--- tFileFetch (file B)
| |
| |--- tFileFetch (file C)
|
Sync
One Star

Re: Recover data stream in talend

thank you for the answer Smiley Happy
But i get several questions

I have to retrieve XML files which are sent to my URL via http post
1) Is this URL has to be the tomcat server url ?
2) Have i to build a php application or can i retrieve the xml directly in talend ?
3) Finally Can i use this component (tfilefetch) with an URL, without the name of the files ? (they will be always different)

Thank you in advance Smiley Happy
Employee

Re: Recover data stream in talend

Yes, presumably you have some way of getting the URL into your job, probably as part of the Context. In that case you can set the tFileFetch properties using the Context variable. If you have multiple files on the data pipeline, say 1 URL per row, you would use the tFlowToIterate component and then set the property of the tFileFetch to the appropriate field of your current row. See the tFileFetch and tFlowToIterate examples in the Reference Component User's Guide (available in the documentation download).
tFileFetch and tFlowToIterate are all DI Job components and hence are found in the Integration perspective. Talend's Unified Platform concept means you can run DI jobs in the ESB runtime container. But you can also implement your use case more directly using pure ESB components in the Mediation perspective.
In Mediation view you would use the cContentEnricher and a Camel HTTP endpoint. You can see the full documentation on these components at http://camel.apache.org/http.html and http://camel.apache.org/content-enricher.html. Additional information is available in the Talend Integration Factory Guide in the documentation. Information about the GUI interface to the content-enricher is available in the Talend Mediation Components Reference Guide.
In this case you would implement the Camel http endpoint in the GUI with the cMessageEndpoint and just specify a component uri using something like:
http:hostname
I have separated it out for convenience and and so that the enrichment route can encapsulate the logic that dynamically sets the url to be retrieved. In general, most things you can do statically in Camel with an endpoint can also be done dynamically by manipulating the exchange headers prior to routing to the endpoint.
See this link for the Camel http4 documentation: http://camel.apache.org/http4.html
Here is a sample route using the contentEnricher as described above. It reads a file with a list of url's which it then retrieves and dumps in another directory.
http://eost.net/eost/contentEnricher.zip
One Star

Re: Recover data stream in talend

Thank you so much for this great answer ! i will work on it