One Star

Dedicated ETL Server??

We are in the process of evaluating Talend, as well as other ETL tools. One of the things we are trying to vet is hardware setup that will give us optimal job performance. Our ETLs will have Oracle sources and Oracle targets, possibly on different database servers.
My understanding (and please correct me if I am wrong) is that if we leverage ELT components, the bulk of the work during execution of the job occurs on the two database servers. In this situation, where would Talend reside? Or is that irrelevant since the db servers are doing all of the work anyway?
If we leverage ETL components, the bulk of the work during execution of the job occurs on the machine where Talend resides, correct? Does this mean Talend should be installed on its own dedicated server to optimize performance?
Any response is greatly appreciated! If you can post your opinions, as well as links pointing to relevant documentation, that would be most helpful.
Thank you!
1 REPLY

Re: Dedicated ETL Server??

Overall, you will want fast network,disks and lots of memory. As long as your ETL server has sufficient free IO's and memory, there is no need to provision a dedicated server. On to the details:
1) when you hear ELT, think SQL. I'm not a fan of the ELT components in Talend, I find it easier and more maintainable to just write the queries myself-- but this decision will come down to the SQL expertise of your team, the components may be a better choice if you do not have good query jockeys. You cannot do ELT between physically separated DB's (for your case, you can create DB links to accomplish this)
2) In general, Talend jobs will be IO and Memory hungry, but will use relatively few CPU cycles. Because of this, your biggest concern when provisioning and designing the hardware is IO. The faster and more efficient your full cycle IO (source->etl->target) the faster and more efficient your jobs will run. Most DBA's will scream, but I have found that hosting the ETL on the same server as the target DB will give you the fastest and most efficient ETL (They hate it for good reason, ETL will use DB IO resources and can impact preformance). If you cant convince DBAs to carve out some room in their kingdom, keep in mind that IO (throughput and latency) are your biggest bottlenecks, and build your network and hardware to optimize for these.