We are using Talend open studio for Data Transfer from Cassandra to SQL. While reading data using Talend job, sometimes we face data loss. And we are unable to find any Error for the same. Even Cassandra System/ Debug Logs are showing very limited information. Is there any setting that we can configure on Cassandra or in Talend Open studio by which this Data loss can be avoided?
Note: We are dealing 5M records/Hour and we are missing approximately 1% of data loss. This is not a consistent issue but intermittent one.
Have you tried to drag 'Rejects' row from your t<DB>Output component to see if your data is rejected during the processing?
we are doing batch insert. In single insert with Reject Row we are not facing any issue in counts, but single insert is very slow so it's not a feasible option as we are dealing with approximately 5M records/Hour.
Would you mind posting your current job design screenshots on forum? Which will be helpful for us to understand your work flow clearly.
We are fetching the "list of partition keys" from SQL databases and dumping in csv file. We store the last run values in another sql table and fetch that as well. Now, once we start the talend package, we insert the current run info in sql. Then we read the csv file that passes on to Cassandra input for the "where clause". Once we get the results, we insert the rows in other sql table. We have also added a log catcher and it rollbacks our sql values then sends us mail. But we are unable to find the problem through that.
following is the flow of our current talend job kindly help.
Please help we are still stuck with the same issue. At Cassandra level we have verified in logs the counts are matching , but destination MSSQL does not have the same same count.
Watch the recorded webinar!
Create systems and workflow to manage clean data ingestion and data transformation.
Introduction to Talend Open Studio for Data Integration.
Test drive Talend's enterprise products.