NoHostAvailableException: Instantiating Cassandra connection with DataStax cluster parameters with tJava or similar

Highlighted
Five Stars

NoHostAvailableException: Instantiating Cassandra connection with DataStax cluster parameters with tJava or similar

Hi all:

 

I have been getting periodic (it occurs about a third of the time I try to truncate and reload a Cassandra table) "NoHostAvailableException" via the DataStax connector.  The error usually seems to resolve itself upon one or two subsequent reruns but I have been thus far unable to circumvent it completely and there is no real rhyme or reason when determining whether or not it will be a failure or not:

 

E.g., 

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /1.2.3.4:9042 (com.datastax.driver.core.exceptions.TransportException: [/1.2.3.4] Cannot connect)))

 

Even waiting potentially several hours between runs, the problem persists, so it does not seem to be attributable to any other performance concerns (my test Cassandra cluster is not used by anyone but myself) and I have tuned up most of the timeout parameters in cassandra.yaml to egregiously high values to no net improvement (although changing the truncate parameter did resolve some previous timeouts that were occurring on table truncate), i.e.,

 

read_request_timeout_in_ms: 100000

write_request_timeout_in_ms: 100000

counter_write_request_timeout_in_ms: 100000

truncate_request_timeout_in_ms: 60000

request_timeout_in_ms: 100000

 

It seems the most promising avenue would be to tune TCP parameters at the actual DataStax configuration level, with the SocketOptions.ConnectTimeoutMillis parameter being the most promising, but I have thus far been unable to do this.  Does anyone have an example of using tJava or similar to instantiate a Cassandra connection (not using tCassandraConnection) that would allow flexibility in defining the cluster connection e.g., the following example here?  As I have just discovered... you can't just manually edit the resultant Java code from the Talend job:

 

https://docs.datastax.com/en/developer/java-driver/3.6/manual/socket_options/

 

Cluster cluster = Cluster.builder()

        .addContactPoint("127.0.0.1")

        .withSocketOptions(

                new SocketOptions()

                        .setConnectTimeoutMillis(2000))

        .build();

 

The only other "resolution" that I'm coming up with now would be to build some retry functionality into the Talend job such that errors are just retried X number of times until a successful connection is established (which wouldn't hurt to build in anyway) but this seems completely unnecessary given that the only problem seems to be that Talend just can't get a connection in time.

 

So to repeat my question... does anyone have any experience in either instantiating the DataStax connection *without* the tCassandraConnection component &/or passing in these additional SocketOptions/PoolingOptions/etc parameters in at some other point in the flow?

 

BTW, I'm using Cassandra 3.3 and TOS BD 7.2.1 [Windows edition].

 

Cheers,

Jonathan

 

EDIT: For the benefit of other passers-by, some more potential links on DataStax Cassandra connection parameters:

 

https://stackoverflow.com/questions/35005734/constant-timeouts-in-cassandra-after-adding-second-node

 

https://dzone.com/articles/tuning-datastax-java-driver-for-cassandra

Five Stars

Re: NoHostAvailableException: Instantiating Cassandra connection with DataStax cluster parameters with tJava or similar

I've not been able to make any progress on this -- even a fairly significant upgrade to our AWS server hardware did not help the problem.

 

I've been doing some prototyping and it appears as though performing a DROP TABLE at the beginning of the ETL job and a subsequent table (re-) create near the end of the operation (as the INSERT transpires) seems to be a workaround for this problem, although it is not as elegant as using the Talend tCassandraOutput truncate functionality.  

 

** EDIT 10/29/19 **

 

In hindsight (and I'm kind of kicking myself for not really understanding the implications of this sooner) I believe this timeout issue really has to do with the fact that most recently-converted RDBMS users -- and even those developing tools for NoSQL -- don't really understand the implications of distributed systems.

 

Talend offers a "truncate and insert" component on its Cassandra input component.  However, in distributed systems, there is no guarantee that a truncate will finish in the same way an ACID compliant RDBMS will.  Most likely, the truncate and reload component works fine on a tiny table over a couple of nodes and maybe a few thousand rows, but often times out and fails on even a "smaller" (1.1M rows in my case) data warehouse type table -- a volume that normally wouldn't raise eyebrows.

 

The low-rent "solution" is to issue a table truncate via a tCassandraRow component -> tSleep [15 seconds seems more than ample for ~1M rows in a 3 node cluster] -> tCassandraInput [that loads the Cassandra table from a hash table].  However, this is not exactly ideal for a host of reasons; what I'm finding is that basic truncate/reload type operations that are standard in an RDBMS DW should maybe be rethought with Cassandra; I can be 99%+ sure that a truncate will finish given enough sleep time but it's the ~1% that will of course get you.

 

 

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Talend Integration with Databricks

Take a look at this video about Talend Integration with Databricks

Watch Now