Talend Clustering

One Star

Talend Clustering

Does anyone in this forum have an idea of how Talend Clustering work? Does tomcat need to be clustered or do you just need multiple tomcat instances running on the same server which would give us the multiple job schedulars (without tomcat clustering). Is the clustering being done by Talend Administration software (quartz.properties file) or do we need to configure tomcat for clustering and apache for load balancing?

Re: Talend Clustering

What exactly are you trying to balance? ETL execution, or access to the TAC web page?
if its ETL execution, tomcat clustering is not required. This is handled by having multiple jobservers installed, coordinated by a single TAC instance.
If you just need a clustered TAC install for fail-over, standard apache+tomcat clustering will work. In some of our production environments we run this configuration.
One Star

Re: Talend Clustering

I am trying to understand how failover works. Is it session based. Are you using the AJP protocol for your environment to run multiple TAC instances and is your clustering done on the same machine or do you have it setup across multiple servers (horizontal tomcat clustering)
One Star

Re: Talend Clustering

Other than failover do you see a performance increase when you are using tomcat clustering + apache loadbalancer?

Re: Talend Clustering

Were using horizontal clustering + bigIP. This is for high-availability of the TAC-- not for performance.
Talend does load balancing like this:
each Talend job can be exported as an executable script. The talend client or the 'commandline' service will take the Talend metadata from SVN and generate executable code from it.
Once the script for the job has been generated, The TAC can publish it to a 'jobserver' service for execution. The jobservers can be installed locally or on remote hosts. They will report back the server load to the TAC, which will make a decision on which jobserver to publish to based on the load.
One Star

Re: Talend Clustering

Thanks for clarifying that information, John. I assumed that the Tomcat clustering was only for failover for the TAC. Are you guys using an SVN server or a database for storing your projects? How are you guys backing up your SVN or project database. Are you using VCS, raid, DRBD or just simply exporting the projects to a removable storage? As far as Talend jobs are concerned when the execution servers are processing the jobs are the data stored in memory or does it process it row by row from the query (input data extraction). I need to make sure that the application servers are not doing all the work but rather the execution servers do the work such as joins, transformations, processing, etc.
Employee

Re: Talend Clustering

Are there any restrictions how BigIP can be set up?
Is the state of the session stored in the DB backend and shared across several TAC/Tomcat instances? Or do I have to ensure that the HTTP session is replicated with some sort of mechanism.
Thanks
Oli

Re: Talend Clustering

In our setup, we dont worry about session so much-- our clustering is for HA only. Tomcat should replicate all that is required when configured as a cluster.
One Star

Re: Talend Clustering

Does Talend recommend thesholds for monitoring disk space, memory, cpu usage? Like for example alarm on 80% disk usage to alert the system admin to free up storage.

Re: Talend Clustering

Yes, you should monitor your ETL server Smiley Wink
Disk space is probably the most critical, especially if you are using disk caching or temp files.
One Star

Re: Talend Clustering

Thanks for clarifying that information, John. I assumed that the Tomcat clustering was only for failover for the TAC. Are you guys using an SVN server or a database for storing your projects? How are you guys backing up your SVN or project database. Are you using VCS, raid, DRBD or just simply exporting the projects to a removable storage? As far as Talend jobs are concerned when the execution servers are processing the jobs are the data stored in memory or does it process it row by row from the query (input data extraction). I need to make sure that the application servers are not doing all the work but rather the execution servers do the work such as joins, transformations, processing, etc.

Any updates to this question? I have the same question. How do you guys replicate the H2 or MySQL database and the SVN between the Tomcat servers?
One Star

Re: Talend Clustering

How to cluster TAC? Can you provide an insight on this.. or do you have any documentation ?