I am working on a solution to carry out file processing for data being derived from various sources each of which has a different variation on the input data schema but all of which need to be mapped to a common format before a series of generic business checks are applied to yield the final result. In principle, this is a straightforward arrangement which breaks down to two general stages. A source dependent mapping process to the common format and a generic process which is common to all sources. To keep things simple, my intention is to create a single generic job which will dynamically trigger the first stage of processing based on information passed in the payload and the second to be a fixed generic process. The job will be triggered on an ad-hoc basis and I am proposing to therefore deploy this as a Web service. As well as making the source dependent part dynamic, I would like to keep this loosely coupled such that additional source mappings can be brought online in a non-intrusive and non-disruptive fashion if possible. I also want to parameterise the whole process such that paths and URLs etc are defined in variables.
I was initially using Talend DI to do the basic mappings and business rules and this has not been an issue, however in trying to get the dynamic processing working I started to run into some issues. I was using the tRunJob, dynamic job feature which didn’t work when deployed as Web service and googling around seemed that others had hit the same problem. I have also tried developing standalone Web service jobs and driving the routing manually in some way but I cannot get the variable based processing to work, even with the service locator.
Having picked up on various references, I had a look whether there might be something in Talend ESB that may help. I am still a relative newbie to Talend overall, but I have managed to stand something up that does seem to work but it has been a challenging learning curve. The solution is a single route exposed as a SOAP service which invokes DI jobs and which dynamically invokes completely separate routes for each source variant which are deployed separately (accessed using direct-vm). On the up side there are other benefits to working in the Container and overall I am not averse to the approach and it seems quite neat. ESB fits better in terms of the processing of a service.
It looks like we will go with the ESB approach, however, I am concerned to commit to a solution that retains a relatively high level of uncharted territory (for us) and also whether this is all a bit over the top for what we are trying to achieve.
The move into ESB was a result of not being able to get the dynamic processing working in DI, the exposure to Mediation components and Routes is just because that’s how ESB works, the move to OSGI and containers for deployment, is not a choice (we probably don’t really need it for its true benefit points), it just comes with the territory (I am more familiar with Web services and App servers). I have googled a lot but am still not sure about the future for OSGI.
I would really appreciate any views from those more informed on whether this is an appropriate approach for this situation. Is it over the top for what I considered was a fairly low-scale processing requirement? Maybe I have missed an alternative approach.
I am comfortable to use Talend if it can do the job. Switching to something else would be a last resort at this stage. I am using Talend 6.4 for both DI and ESB.
Thanks in advance.
Could you please set an example for us about your dynamic processing working? Which will be helpful for us to understand your job requirement.
Watch the recorded webinar!
Accelerate your data lake projects with an agile approach
Create systems and workflow to manage clean data ingestion and data transformation.
Introduction to Talend Open Studio for Data Integration.