Tutorial - Talend Component Kit #3: Building an Input Component (Dataset, Datastore, Partition Mapper & Source)

Highlighted
Five Stars

Tutorial - Talend Component Kit #3: Building an Input Component (Dataset, Datastore, Partition Mapper & Source)

Thumbnail-3-english.jpg

 

 

This part of the tutorial explains the structure of an input component of the "Talend Component Kit ". The focus here is primarily on the structure and context of the individual parts of the component and not the implementation. This is discussed in Part 4 and 5 of this tutorial.

After the third part you know:

  • the task of a Dataset and Datastores, as well as its interrelations
  • how a Partition Mapper works
  • the properties of a Source

1. Dataset & DatastoreTalend-Component-Kit_-Datastore_-set.png

 

As shown in the graphic, the datastore is part of the dataset, and the dataset is part of the input or output component. The task of the dataset and datastore is to provide the information where, how and which data is to be processed in the component.

 

The division of the area of responsibility between dataset and datastore is uniquely defined.

  • Datastore: Contains the data required for the connection with the Backend required are
  • Dataset: Contains the datastore and the data required for the Processing of the data are necessary

2. Partition Mapper

The class of the "Partition Mapper " must implement three methods, each of which must be marked with the corresponding annotation:

  • Assessor
  • Split
  • Emitter

The idea behind the "Partition Mappers " is to first estimate the effort of data processing and to break it down into parts before execution to allow more efficient execution. In In the case of simple queries, such as a query from a RESTful API like Jira, this division is not necessary and the "Patition Mapper " is created only once.

 

2.1 Assessor

The assessor's task is to estimate into how many parts the task is ideally divided into. This number must in fact only be estimated and not exact.

 

In the case of the Jira component, the assessor returns 1, since it does not make sense to split an HTTPS query.

 

2.2 Split

The split method is used to split the tasks of the mapper. This means that it returns a list of partition mappers that only have a part of the tasks to complete.

 

The split method of the Jira component returns itself in the form of a single-tone list, since it will remain the only partition mapper.

 

2.3 Emitter

The emitter is responsible for finally instantiating the source class and executing its producer method. It does not receive any data, but uses the configuration, which is defined as attribute in the classroom.

 

The emitter should return the instantiated class as the result.

 

3. Producer-Method in the Source Class

The producer method in the source class takes over the actual solution of the task. The result is returned in a "Record ". This can be created using the "RecordBuilderFactory ".

 

The method is obviously called by each emitter, so it must return different results. In the case that the processing of the task is completed, a "zero " value is returned. and we delivered it back to you.

 

4. Further tutorials

Feel free to send us a mail to talend@odisys.de for any questions or notes .

 

Source Code:

The following tutorials can be found here:

  •  

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now