Six Stars

tSalesforceOutputBulk etc - why not allow standard CSV files

Have been trying to process large numbers of SF records (~2m) and have done so using the tSalesforceOutputBulkExec component. However I wanted to break the process down into slightly more manageable chunks (100k at a time), so it seems I have to prepare the files using tFileOutputDelimited to provide the automatic splitting capability. What I cannot see is why I cannot then seem to just send those to SF for processing? It seems from the docs and trying it that I need to iterate over my split CSV files and run them back into tSalesforceOutputBulkExec (which creates a new, virtually identical CSV file) which it then sends to SF for processing. It is not clear why this is necessary?
Also In am sending just a record Id and a new field value into the flow but on the Reject row out of tSalesforceOutputBulkExec I see only the new field value and an error message, it doesn't seem to be passing thru the important bit, the Id of the actual record that failed to update - is this a bug or am I missing something?
Ta, M
10 REPLIES
Five Stars

Re: tSalesforceOutputBulk etc - why not allow standard CSV files

you have a few questions here.
You can use tSalesforceOutput and tSalesforceBulkExec to do what you want.
Remember that you can control your batch size. See Rows to commit on advanced settings for tSalesforceBulkExec.
The reject channel will only show your input fields and error; but you should at least see an external key if that's how you're working.
Six Stars

Re: tSalesforceOutputBulk etc - why not allow standard CSV files

Thanks but don't you mean tSalesforceOutputBulk
In the sceanrio I described in the original post I was trying to avoid a CSV fle with 2m rows but AFAICT using tSalesforceOutputBulk and tSalesforceBulkExec works but processes everything in one go. If I still want to split the csv file into chunks, would I be able to use a plain tFileOutputDelimited with the split option to create the CSV for salesforce and then iterate over the files produced, plugging the name in to the Bulk file path setting? I can't see what tSalesforceOutputBulk then actually does that makes it any smarter than tFileOutputDelimited (or is it the former that actually sends the file to the server? In which case it would be better to have a SF component that just does the file xfer maybe)
Ta,
M
Five Stars

Re: tSalesforceOutputBulk etc - why not allow standard CSV files

I did.
There is no difference between the Salesforuce output and CSV, except for formatting, so you want the component to do it for you.
If you really want to batch the data yourself, then you can do this by using the component to create a series of files, and then load each one.
Under normal circumstances, there should be no need to do this.
Six Stars

Re: tSalesforceOutputBulk etc - why not allow standard CSV files

OK that's what I was doing, with the apparently redundant step of going via a tFileOutputDelimited. I'm trying it several ways because to send 2m rows does cause issues with timeouts and quota and feels "wrong". Now I am trying tFileOutputDelimited and tSalesforceBulkExec but the problem is that tSalesforceBulkExec seems to refuse to take an Iterate input - why would that be? Am I going to have to put that in a job on its own and call it repeatedly?
M
Five Stars

Re: tSalesforceOutputBulk etc - why not allow standard CSV files

Hi mhayhurst,
Could you pleasepost more infos 
Talend Job  and the Error messages etc.
regards john 
Fifteen Stars

Re: tSalesforceOutputBulk etc - why not allow standard CSV files

When you are using the tSalesforceBulkExec it sends the recordset to Salesforce as a file and Salesforce breaks it into chunks to be processed (in chunks of 10,000 by default I believe). So breaking your file into chunks is not going to help the process. As far as you are concerned, this is what happens....
1) You prepare the bulk file
2) The bulk file is sent via a web service. Your 2 million records will not take a great deal of time to send this file compared to 20 web service calls to send 100,000 records. I would argue that one service call would be quicker.
3) Salesforce then starts chugging away (I use "chug" because it's performance is god awful compared to using an on premise solution...just my thoughts).
The problem you have sounds like it is with Salesforce. It could be because of lots of triggers on your records or it could be because the apex code that is fired is pretty poor (I have seen this cause real degradation in performance).
If you want to try sending your data to Salesforce in 100,000 chunks, why not just filter your records to 100,000 before loading them as Tal00000 said? What you are trying to do is a bit like slow a car down using the brake while your foot is flat to the floor on the accelerator. All you need to try is lifting off the accelerator (reduce the batch size). 
Rilhia Solutions
Six Stars

Re: tSalesforceOutputBulk etc - why not allow standard CSV files

1st image is how I'd like to run it, extract Id and value to be updated from inbound feed, split into 100k chunks and then iterate over the BulkExec etc but this doesn't work.
The 2nd is the what I get using the recommended approach with all rows (412k in this case) being processed in one hit...
Five Stars

Re: tSalesforceOutputBulk etc - why not allow standard CSV files

Salesforce allows 5000 batches of up to 10k records per batch, in a 24 hour period for each org.
2m rows should not be a big deal, for loading in a single pass.
As suggested, Salesforce triggers, Apex etc. can cause issues.
Six Stars

Re: tSalesforceOutputBulk etc - why not allow standard CSV files

OK thanks, this is an update on a complex object with loads of custom fields and triggers etc, some of which are definitely a bit flaky (not my circus, not my monkeys!) - I seem to have indeed now hit an API Batch limit anyway so have 24hrs to think about it Smiley Sad
M
Five Stars

Re: tSalesforceOutputBulk etc - why not allow standard CSV files

it's a rolling 24 hours, so it may be a little sooner Smiley Happy