One Star

Push-down the processing to the DB-engine

Is Talend able to detect that all objects in a step share the same JDBC connection and to generate a DML statement?

 

For example, a step select from 3-4 joined tables and fill a target table with the resuls of the query.

 

Can Talend generate a statement of the type:

INSERT INTO

  target

SELECT

 ....

FROM

  source1

JOIN

  source2

JOIN

  ....

 

and simply run the statement in the database?

3 REPLIES
Fifteen Stars

Re: Push-down the processing to the DB-engine

Will you be writing jobs which will be used in different environments where sometimes the source data tables can be linked inside a single RDBMS and sometimes they cannot? I would not have thought so. If you are, it would make sense to write more than one job to do it in the most efficient way.

What is it you are trying to do? There is absolutely no need to pull data out of your database from single tables, join/filter in Talend and then send it back. That is a particularly inefficient way of working and I never advocate that. You can quite easily write complicated queries that are carried out on the DB to only return data that is absolutely necessary in the Talend job, do whatever you want to do with it and then send it back (or somewhere else). You can also use the ELT components (although I tend not to). 

 

If you want you can write precisely the statement you have given as an example in a t{Database}Row component as well. 

Rilhia Solutions
One Star

Re: Push-down the processing to the DB-engine

@rhall_2_0 thank you for the quick reply. I also prefer to write statements myself and let them being execute in a ETL or workflow tool.

The question came from a user of traditional ETL tools like SAP DS, Informatica or Oracle Warehouse Builder.

The use case is the typical aggregations from a staging area to a star schema/reporting layer.

You can use the mentioned tools like "visual query designers" that create "mappings".

When the mappings are executed, the ETL engine convert them in DML statements that are executed by the DB.

For example, OWB even create PL/SQL packages.

I used Talend only for ingestion into DWH/Data Lakes, that's why I pose the question.

Are the ELT components in Talend capable to substitute Informatica/SAP DS/DataStage in classical data warehouse workflows where users graphically "design" such steps?

The ability not to move data from the db to the ETL tool and back to it, but to generate an INSERT ... SELECT statement is called "push-down" or "push-down to the database". At least this is the term that Informatica and SAP DS guys use.

Fifteen Stars

Re: Push-down the processing to the DB-engine

Ah I see, you are expecting Talend to work the same way as Informatica. I thought that would be the case when I moved as well. I hated Talend when I moved from Informatica for the first few months, but once I learnt how Talend works I actually found it a far superior tool. However I will accept that might be because of how Talend is only limited by your imagination if you can write Java.

 

Your use case sounds like it may be covered by the ELT components. You can read about the ELT functionality here: https://www.talend.com/resources/elt-vs-etl/. However I would argue that Informatica building a tool to write SQL for developers is a bit of a waste of time. A decent data integration developer should be able to at least write SQL. I would argue that they should be able to write better SQL than a tool as well. If the processing is required to take place on your DB server (which is arguably the best place for processing of already staged data), then Talend can comfortably handle that. But it won't do all of the work for you.

 

Rilhia Solutions