Create Workflow Engine for processing integrations programs

Four Stars

Create Workflow Engine for processing integrations programs

Hi,

We are researching whether Talend Open studio tool can be used for our upcoming project which has the workflow similar to:

1. Getting metadata from db for each active program

2. For each active program,we need to check whether the program is scheduled ,

3.If yes then we should get the program module and run 

4. then the loop continues back to checking @ point 2

Can anyone suggest if this can be achieved by Talend open studio for integration or any other input which helps us to know feasible solution.


Accepted Solutions
Ten Stars

Re: Create Workflow Engine for processing integrations programs

I would never recommend software like talend, informatica, sas, pentaho, etc, etc. to build something like this from an architectural point of view. However technically and functionally its possible. But this so called "generic" stuff is determined by software limitations and there own view+implementation on generic processes but it still needs to generate this java code ... 

 

My 2 cents on this topic:
When "generic" enters the battlefield of data-engineering, things go into another abstraction level.
The whole abstraction process you want to create/define should be platform independent, in some sort of template which contain business rules, encoding types, data information, validity checks... or even ai / ml stuff.
Maybe even how you want to process and store, you want columns and or rows as vectors, unique key validation, hashing, encryption, privacy... contracts, etc... What about exceptions, logging monitoring?!? Maybe different SLA agremeents?

 

Here's a basic example of which i use in Talend and Python for my job configs and every part is accessible via NodeJS / api.

I think the 'beauty' (yes still a lot of room for improvement) of this is that you extract your business rules into configuration which are also accessible by other domains in your architecture: 

"jobs" : [ 
	{"job" : {
	"name" : "MyEmailWeb",
	"DB_Schema" : "something",
	"hdfs_dir" : "/etl/MyEmailWeb", 
	"process" : true,
	"description" : "E-mail Service", 
	"start_date": "2015-01-01 00:00:00",
	"eprivacy" : true,
"create_library" : true, "data_items" : [ { "campaigns" : {"process" : true , "table" : "campaigns"} }, { "groups" : {"process" : true , "table" : "groups"} }, { "mailings" : {"process" : true, "table" : "mailings" , "vectorize": ["clicked", "time"] } }, { "bounces" : {"process" : true , "eprivacy" : true, "retention_days" : 730 , "table" : "bounces", "mask_columns" : "contactID"} }, { "contacts" : {"process" : false , "eprivacy" : true, "table" : "contacts" } } ],
"data_quality" : [],
"analytics" : [] }}, {"job" : { "name" : "KissTheFrog", ............... you get it }}

 


All Replies
Ten Stars

Re: Create Workflow Engine for processing integrations programs

- What is : "active (integration) program" ? or ... define please?
- Metadata as in: time, owner OR column name and datatype?
- Workflow you mean proces flow?

My answer would be yes if its data / ETL related ... but I dont understand your definition(s)? A workflow engine has a different purpose and different functions/usability and helps automating your day2day business/operations and does 'human' tasks based on defined business rules.

The big difference regarding workflow/business rule engines, they are designed for human input and wait for it before it continuous and send a reminder in order to complete its task. Example: approve invoice, or cancel a subscription because of a (online) complaint. So the more : if this do that and if not wait for human input ... after this do that with respect to... etc etc... I wouldn't use Talend, but it is possible without a problem but you need some skilled/experienced Talend developers.

If its typical business rule engine and functionality stuff you are looking for take a look at something like Drools. If its data related to trigger other data events with less/limited human interaction Talend it is.
Four Stars

Re: Create Workflow Engine for processing integrations programs

This would be building Integration platform to load 3rd party data to our database. Programs are kind of top level information within the Integration services.Yes,Metadata as in column name and field type. Currently system is handling vendor based adhoc integration via windows workflow foundation. We are looking for more generic solution through talend wherein we check certain conditions as per business requirements and create a process flow for Integration to happen.

The main challenge is we have to dynamically generate and populate our staging table ie we have to include certain standard columns and dynamically fetch columns from vendor files for creating staging table. Is this possible in Talend Open studio free version?

Ten Stars

Re: Create Workflow Engine for processing integrations programs

I would never recommend software like talend, informatica, sas, pentaho, etc, etc. to build something like this from an architectural point of view. However technically and functionally its possible. But this so called "generic" stuff is determined by software limitations and there own view+implementation on generic processes but it still needs to generate this java code ... 

 

My 2 cents on this topic:
When "generic" enters the battlefield of data-engineering, things go into another abstraction level.
The whole abstraction process you want to create/define should be platform independent, in some sort of template which contain business rules, encoding types, data information, validity checks... or even ai / ml stuff.
Maybe even how you want to process and store, you want columns and or rows as vectors, unique key validation, hashing, encryption, privacy... contracts, etc... What about exceptions, logging monitoring?!? Maybe different SLA agremeents?

 

Here's a basic example of which i use in Talend and Python for my job configs and every part is accessible via NodeJS / api.

I think the 'beauty' (yes still a lot of room for improvement) of this is that you extract your business rules into configuration which are also accessible by other domains in your architecture: 

"jobs" : [ 
	{"job" : {
	"name" : "MyEmailWeb",
	"DB_Schema" : "something",
	"hdfs_dir" : "/etl/MyEmailWeb", 
	"process" : true,
	"description" : "E-mail Service", 
	"start_date": "2015-01-01 00:00:00",
	"eprivacy" : true,
"create_library" : true, "data_items" : [ { "campaigns" : {"process" : true , "table" : "campaigns"} }, { "groups" : {"process" : true , "table" : "groups"} }, { "mailings" : {"process" : true, "table" : "mailings" , "vectorize": ["clicked", "time"] } }, { "bounces" : {"process" : true , "eprivacy" : true, "retention_days" : 730 , "table" : "bounces", "mask_columns" : "contactID"} }, { "contacts" : {"process" : false , "eprivacy" : true, "table" : "contacts" } } ],
"data_quality" : [],
"analytics" : [] }}, {"job" : { "name" : "KissTheFrog", ............... you get it }}