I am trying to delete about 130,000 tasks from a resolution campaign on a daily basis and reload the same number of tasks the next day. I understand that this is not the ideal way to use stewardship, but that is the clients requirement. When ever i try to delete the 130K tasks using the tDataStewardshipTaskDelete component, it throws a java heap space error. I am trying to load the delete records to a csv file on the jobserver. I tried to change the jvm settings on the job to use upto 2gb max. Still fails.
Is there any better way to deal with this? is there any way to delete records in Batches so that we don't run into the heap space issue?
You can extract the task lists into a file as first step. The read the file in iterative manner so that the entire data set will not be pushed to tDataStewardshipTaskDelete component in one shot.
Please add the query section and other relevant sections in taskdelete component using the data from input file to filter the data properly.
The response time might be more since we are doing the data deletion at row level rather than bulk level.
Another idea will be to push data into multiple files as batches and read one batch at a time.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
In this video, you will see how to assign your best team members specific tasks to reconcile, correct, merge, arbitrate or group pre-determined data and achieve quality, clean data in a limited time
In this short series, you will see how Talend Data Stewardship transforms your employees into data citizens and enables them with self-service capabilities to control their quality data
Find out about Talend Open Studio for Data Quality