Talend Database Advice

Highlighted
Four Stars

Talend Database Advice

Hi,

We are in the process of setting up Talend Data Services on our servers.

We need help in deciding which RDBMS software to use for our TAC admin database as well as a Staging area for our jobs.
Our experience in the past has been with Sql Server however we are considering moving to Postgres in order to save costs.

Our business use cases are:

  • For the database to handle a 1000 or more possible connections at the same time. Where the connections are mainly (70%) for small lookup tables (less than 100 rows) within our Talend Jobs, Selecting and Inserting a few thousand records from/to the database and also doing updates on tables that are a few million rows in size.
  • To store several hundred gigabytes of data across a few 'archive' tables. These tables will rarely be queried by users but will often have data inserted into them.
  • Store all our logs generated from our jobs.
  • The database must process the data quickly as we need to ingest data, process it and pass it on to our customer in the shortest time possible. The dataset size here would be on average 10 000 rows which would come in every 30 seconds. The database processing here would be minimal.  We will mainly insert it into our database, do some lookups on 1 or 2 fields and then select the data and send it off in our Talend job.
  • In the near future, we will also need to be able to process datasets of 1000 or less rows every 5 seconds in very much similar way to above point.

 

Could you please make a suggestion which database software will best serve us with Talend?


Thanks,
Jason


Accepted Solutions
Employee

Re: Talend Database Advice

Hi,

 

     You are right. The TAC should be having a separate database and it should not be competing to get database resources from another operational database. So the recommended way is to have separate databases for TAC and your staging area (which will be used for your operational work).

 

     From Talend point of view, we support both MS SQL and PostgreSQL and we are giving the option to use different types of databases as users in different company will have different maturity level in handling different DBs. In your case, I feel like your company is more proficient in using MSSQL. So I would say start using with MSSQL so that you will know how to tune the DB easily. Once your team is confident about PostgreSQL, then you can plan for the migration. 

 

 

Warm Regards,

 

Nikhil Thampi

View solution in original post


All Replies
Employee

Re: Talend Database Advice

Hi Jason,

 

     Point number one I would suggest is to isolate the DB used for TAC with other operational factors like staging area. For TAC, you can either use SQL Server or  PostgreSQL as both are supported, although the recommended databases are MySQL 5.7 and Oracle 12c R1 (for Version 7.0 of Talend). 

 

    For staging table also, both SQL Server and PostgresSQL should work provided you are giving right system resources. You will have to consult with your DBA team to arrive at the system resources required for staging area based on your current volume metrics (provided below) + estimated growth for next 3 years. 

 

    If the answer has helped you, could you please mark the topic as resolved? Kudos are also welcome :-)

 

Warm regards,

 

Nikhil Thampi

Four Stars

Re: Talend Database Advice

Hi Nikhil

 

Thanks for the response.

 

Could you please tell me what do you mean by isolating the TAC from the staging area? Do you mean separate databases on the same instance or in separate instances like for example on two separate database servers?

 

With regard to the staging area, you do not see any difference between SQL Server or Postgres? We don't want to be in the situation where we need to throw system resources to make the database work for us.

We do not have a DBA team unfortunately.

 

We do not have any Postgres knowledge but are considering migrating to it.

So we are just making sure it will work as well as SQL Server (or better) on Talend for our Use Cases.

 

 

Does anyone have experience working with both databases in Talend?

Would there be any Pros and Cons to using either database or are they both fairly similar?

 

 

Thanks,

Jason

Employee

Re: Talend Database Advice

Hi,

 

     You are right. The TAC should be having a separate database and it should not be competing to get database resources from another operational database. So the recommended way is to have separate databases for TAC and your staging area (which will be used for your operational work).

 

     From Talend point of view, we support both MS SQL and PostgreSQL and we are giving the option to use different types of databases as users in different company will have different maturity level in handling different DBs. In your case, I feel like your company is more proficient in using MSSQL. So I would say start using with MSSQL so that you will know how to tune the DB easily. Once your team is confident about PostgreSQL, then you can plan for the migration. 

 

 

Warm Regards,

 

Nikhil Thampi

View solution in original post

Four Stars

Re: Talend Database Advice

Thanks much appreciated.

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Best Practices for Using Context Variables with Talend – Part 2

Part 2 of a series on Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog