I am trying to delete about 130,000 tasks from a resolution campaign on a daily basis and reload the same number of tasks the next day. I understand that this is not the ideal way to use stewardship, but that is the clients requirement. When ever i try to delete the 130K tasks using the tDataStewardshipTaskDelete component, it throws a java heap space error. I am trying to load the delete records to a csv file on the jobserver. I tried to change the jvm settings on the job to use upto 2gb max. Still fails.
Is there any better way to deal with this? is there any way to delete records in Batches so that we don't run into the heap space issue?
You can extract the task lists into a file as first step. The read the file in iterative manner so that the entire data set will not be pushed to tDataStewardshipTaskDelete component in one shot.
Please add the query section and other relevant sections in taskdelete component using the data from input file to filter the data properly.
The response time might be more since we are doing the data deletion at row level rather than bulk level.
Another idea will be to push data into multiple files as batches and read one batch at a time.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Find out about Talend Open Studio for Data Quality
Learn how to enable Data Governance
Take a peek at the definitive guide to Government Data Quality