Migrate Pentaho solution to Talend

Four Stars

Migrate Pentaho solution to Talend



I don't know if this topic is the correct one to my question but were we go:


I want to migrate a solution implemented using pentaho to talend and i want to know what is the best approach to replicate the pentaho scenario in talend. The scenario is the following one:

- N file Feeds with files each 2, 5 or 15 minutes that need to be:

  • File download:
    Files are downloaded from the Global Share (ptsun32) to a DI Cluster Share (share).
    As-Is: Three Jobs for 3 different “feed” priority levels create a, per job configured, amount of threads. Each thread then starts a “feed” download (“feed” is a specific type of file at this stage).
    • Fault Tolerate must be ensured: this job is not dependent on one a single DI Node.
  • File Acquisition:
    Files are acquired by the solution on the DI Cluster Share.
    As-Is: one Job to identify and catalogue files (register them into the files table on the solution database - postgresql) and move the files into an Inbox Folder.
    • Fault Tolerate must be ensured: this job is not dependent on one a single DI Node.
  • File ETL JOBs
    Files are Extracted and Transformed from the Inbox into External Oracle Tables and then Loaded to the RDBMS database.
    As-Is: Each feed has one ETL Job work on Share’s feed specific subfolder and uses a specific External Table Type.
    • Sequential Loading is preferable, although not blocking.
    • Fault Tolerate must be ensured: this job is not dependent on one a single DI Node.
    • Failure Resilient must be ensured: this job (on a specific file) is to be restored in another DI if the first one crashes or becomes unavailable.
  • Global Load Balancing must be ensured:
    All these Jobs need to maximize the Cluster resources (specially RAM). The Jobs (specially ETL ones) need to be assigned to Cluster DI with available execution resources.
  • Global Failure Tolerance:
    The DI Solution Architecture need to ensure that there is no single point of failure  

I hope that this description provides enough information in order you can guide me on the best path to the architecture needed in talend to respond to this scenario.



David Santos 


Re: Migrate Pentaho solution to Talend


Could you please elaborate your scenario with an example with input and expected output values?

Best regards


Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Four Stars

Re: Migrate Pentaho solution to Talend

Hi Sabrina,


My scenario is the following one:

- N files placed in a remote folder by external processes.

- My application must acquire those files, register them in a control table, and process then. This process includes enrichment of the file by processing the values contained in it and by consulting database tables and after that produce a file that will be loaded to one database via external table.


In a very simple description the scenario is the one described above but as I try to specify in the first post we have those requirements because this is a very intensive use scenario.


Just for the record in Pentaho we have 5 Data Integrations in cluster mode to ensure the processing of the files but even with this configuration the client is not satisfied with the solution because isn't fastest enough and have a lot of crashes on those DI's.





Talend named a Leader.

Get your copy


Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables


How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration


Why Companies Move to the Cloud: 7 Success Stories

Learn how and why companies are moving to the Cloud

Read Now