Four Stars

Datasets & hashfiles in Talend

Hi,

We are in process of migrating our Datastage 8.7 version jobs to TALEND.

Is there any component similar to 'datasets' & 'hashfiles' in datastage available in TALEND.....?

 

Regards,

Amirtharaj.R

  • Big Data
3 REPLIES
Eleven Stars

Re: Datasets & hashfiles in Talend

You really need to give us a little more info on what "datasets" and "hashfiles" are and what they do with regard to Datastage. Chances are Talend will be able to handle the functionality they provide out of the box. If not, the major advantage of Talend is that you can write your own functionality (or include that of others) using Java.

Rilhia Solutions
Four Stars

Re: Datasets & hashfiles in Talend

Datasets are internal file formats in Datastage, which can be used as intermediate files for lookup and other operations and manage the data within the job. Moreover since the dataset files are in binary format, read / write to these files are very fast comparatively

 

For e.g, when u need to do a lookup from a large table, we can write the data to the dataset file and use it in other jobs rather than select the data from the DB again.

Tags (1)
Eleven Stars

Re: Datasets & hashfiles in Talend

You can use tHashInput/Output components for this sort of thing in Talend. This will depend on memory though. If you want to store gigabytes of data in memory, you will need the memory on your machine. However it is very quick. There are other ways in which you can increase performance by removing the latency of db lookups, but tHash components are the first that come to mind. You also have to consider that with Talend you have every Java API available to you, so finding alternatives is easy if necessary

Rilhia Solutions