One Star

tKafkaInput inbuilt schema has offset & payload col but no KEY column

Hi Team,
In TOS, tKafkaInput component has a default schema which has columns PAYLOAD and OFFSET as shown below:

Issue: As per my understanding, on a kafka cluster topic a KafkaMessage has "topic, partition, offset, key, value" attributes. And "offset & value" kafka message attributes are mapped to tKafkaInput Offset and Payload columns. I would like to know if we can retrieve KEY column as well in tKafkaInput schema.
Thank you for your help and time.
RR
5 REPLIES
Employee

Re: tKafkaInput inbuilt schema has offset & payload col but no KEY column

Hello,
for time being, tKafkaInput does not provide a way to retain the key from incoming messages (and tKafkaOutput does not provide a way to write your own key with a message).
You may want to create a feature request for that. There is room for improvement concerning Kafka components features since the offset should not be useful without having information about the partition id.
One Star

Re: tKafkaInput inbuilt schema has offset & payload col but no KEY column

Thank you for reply.
Exactly, i feel the same for offset as well and i have already raised a request for that feature to be added but it is currently ON-HOLD, how do i move it to other STATUS? https://jira.talendforge.org/browse/TBD-3321
Regarding Key column, as suggested i have added a new feature request as per this topic.
https://jira.talendforge.org/browse/TDQ-12147
Regards,
RR
Employee

Re: tKafkaInput inbuilt schema has offset & payload col but no KEY column

Hello,
actually your feature request in TBD-3321 has a bigger scope than what I mentioned about retreving message keys within the tKafkaInput schema. In my opinion, there are two different things :

Providing options allowing to retrieve informations about the message key, partition id and the offset within the tKafkaInput schema. This is something we could easily consider since it does not require much prior analysis.
Giving means to provide custom starting offsets for the consumers. This is something much more tricky to offer since a lot of things have to be taken into account. For example, the user has to know how the topic is actually partitioned beforehand and consequently very error-prone. What about potential other consumers belonging to the same consumer group ? Etc... some more deep analysis is required before starting but I think this would be feasible (at least in 0.9).

I don't know if your feature requests will be accepted, but you should not expect to see something about them in 6.2. For this release, we did focus on the new security features and on the new consumer API support, both introduced in Kafka 0.9.
Hope that you will find those info useful.
One Star

Re: tKafkaInput inbuilt schema has offset & payload col but no KEY column

Hello,
Thank you for providing update on my requests.
I agree & understand TBD-3321 has a bigger scope and need lot of analysis before implementation to ensure it is full-proof. I had still raised this request to see if custom offset/partition feature can be implemented in future releases.
***Giving means to provide custom starting offsets for the consumers. This is something much more tricky to offer since a lot of things have to be taken into account. For example, the user has to know how the topic is actually partitioned beforehand and consequently very error-prone. What about potential other consumers belonging to the same consumer group ? Etc... some more deep analysis is required before starting but I think this would be feasible (at least in 0.9).***
For above statement i have a question, what risk "the user has to know how the topic is actually partitioned beforehand and consequently very error-prone. What about potential other consumers belonging to the same consumer group ?" hold? I'm not sure about it but as per my understanding:
Say a Kafka topic T1 has 2 partitions (P0 & P1) and one consumer group (with 2 consumers C0 & C1) reading data from kafka topic T1 respectively. 
T1 --> P0 --> C0
    --> P1 --> C1
As per TBD-3321 feature request, say i want to provide custom starting offset 999 just for Partition0 and do not want to touch Partition 1 & want to keep as Latest only. In that case, if we can add below features:
1. If we can add KakaInput in MetaData section, then it should be able to pull information like number of partitions.
2. Add "Number of Partitions" to tKafkaInput: 2 (if this can be identified using METADATA then we can make it accurate and avoid error)
3. Add "Custom offset" to tKafkaInput: "Partition 0 : 999"   /*I feel that if any USER wants to provide a custom offset then the user will have information about partition IDs as well. And, If we want to provide custom for multiple partitions those can be listed comma separated in similar fashion*/
So, in this case when Talend job will execute, whichever consumer is reading data from Partition 0 should start reading data from custom offset 999 and other consumers which read data from Partition1 should simply read from latest offset. My understanding might not be fully correct and possibly have gaps so request you to please help me understand what risk above scenario can hold?
Thanks again for your time and help.
Regards,
RR
Six Stars

Re: tKafkaInput inbuilt schema has offset & payload col but no KEY column

Hello,

 

Just interested to know if there is any news on this jira. I see that its getting close to 2 years and still the jira is on hold. With many requesting for this feature.. Atleast the partitions and the keys columns can be provided if not for the custom partition and offset.