Five Stars

Data Loss while transferring from Cassandra to SQL using Talend

We are using Talend open studio for Data Transfer from Cassandra to SQL. While reading data using Talend job, sometimes we face data loss. And we are unable to find any Error for the same. Even Cassandra System/ Debug Logs are showing very limited information. Is there any setting that we can configure on Cassandra or in Talend Open studio by which this Data loss can be avoided?

Note: We are dealing 5M records/Hour and we are missing approximately 1% of data loss. This is not a consistent issue but intermittent one.

4 REPLIES
Moderator

Re: Data Loss while transferring from Cassandra to SQL using Talend

Hi,

Have you tried to drag 'Rejects' row from your t<DB>Output component to see if your data is rejected during the processing?

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Five Stars

Re: Data Loss while transferring from Cassandra to SQL using Talend

we are doing batch insert. In single insert with Reject Row we are not facing any issue in counts, but single insert is very slow so it's not a feasible option as we are dealing with approximately 5M records/Hour.

Moderator

Re: Data Loss while transferring from Cassandra to SQL using Talend

Hello,

Would you mind posting your current job design screenshots on forum? Which will be helpful for us to understand your work flow clearly.

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Five Stars

Re: Data Loss while transferring from Cassandra to SQL using Talend

 

Hi Sabrina,

We are fetching the "list of partition keys" from SQL databases and dumping in csv file. We store the last run values in another sql table and fetch that as well. Now, once we start the talend package, we insert the current run info in sql. Then we read the csv file that passes on to Cassandra input for the "where clause". Once we get the results, we insert the rows in other sql table. We have also added a log catcher and it rollbacks our sql values then sends us mail. But we are unable to find the problem through that.

following is the flow of our current talend job kindly help.

 

image.png