I have an almost finished build as shown below:
my next issue to tackle is that the tFileInputExcel is mapped to a specific Excel file that I created and emailed to that account specifically for testing. In production, multiple excel files (filled out with the same column titles) will be emailed to this address, and automatically saved into the folder on my computer (via subjob 1), which we'll call "Folder 1". I need to have the information from each excel file uploaded into the salesforce environment through this process. Would it be more plausible to have the first half of subjob 1 occur, then have a subjob that will combine all excel files in "Folder 1" into one master file, then have that file be the specified file name in the tFileInputExcel, or is there a way to have it point to a directory instead of a specific file, where it would grab all excel files from "Folder 1" and upload them? I will have this job run once a day as to upload any new data that comes in.
You can use a tFileList component with a file mask (*.xls or *.xlsx, I'd imagine) to get a list of Excel files in a folder. You can connect that with an Iterate link to your input component to read in each file one at a time.
I have a few projects here that do what you describe. One job retrieves files from one or more sources and stores them in a folder. Another job comes along later and processes all files in the folder, archiving them if the process is successful or shunting them off into a separate folder for review if there's an issue.
Edit: I had a thought after posting. Since you appear to read the same Excel file multiple times, you may want to put the tFileList component in a parent job and connect it to a child job (your current job, minus the email extraction). You would set a context variable to the current file path in your tFileList iteration and use that variable in your tFileInputExcel components in the child job.
Thank you for the reply, I will research context variables and try to figure out what exactly needs to be done there to make that work. New to talend, but I'm sure I'll be able to find help on that. I also have the issue that Salesforce's latest update blocks duplicates, the job fails once it hits a contact that already exists; any idea on how to have it skip over repeats?
Excellent, with UPSERT it does work on skipping duplicates. Would you mind going into more detail on your explanation on using tFileList and a context variable in a separate job to have it pull from all sheets in the folder?
At it's simplest, it looks like the screenshot below. In the child job, switch to the Contexts tab and click the green plus button to add a context variable. Give it a name (I used "fileName") and the appropriate type (String). In the parent job, add a tRunJob component configured to run the child job. In the Context Param area, click the green plus button to add a parameter, select your new context variable from the Parameters dropdown, and in the correspond values area, add the global variable exposed by the tFileList component that contains the current filepath. Pressing ctrl-space and scrolling down to find it can be easier than trying to type it exactly right.
Back in your child job, any place where you need to reference the file name, you type context.fileName and it will use the value set by the parent job.
Excellent, just two more questions if you don't mind:
1. Any idea why the option to add a context param is greyed out? I've been messing with settings trying to get it to be available but I've been unsuccessful thus far.
2. When you say to reference "context.fileName" in the child job, would that be in the tFileInputExcel objects, or would I need to use an entirely different object?
I appreciate all your help so far.
noted. I have created the parent and child job successfully, the child job remains the same as pictured above with the references set to context.fileName, and the parent job is as pictured below with each component details:
When I try to run the parent job, it doesn't seem to do anything then disconnects, as shown below:
any idea why the process is not reaching the runJob object?
Progress was made, I have got it to attempt to run, but am getting the following error message:
That worked, now I just have to figure out a flow on Salesforce (having issues with creating duplicate accounts without related contacts, need to create a flow to delete accounts that do not have associated contacts in the related lists) and this project is done. I can very well take this issue to salesforce community and mark your answer as the fix, I appreciate everything you have done. Are you familiar at all with Salesforce and would potentially be able to assist me with this issue, or should I just go there?