Bulk loading into Cassandra

Five Stars

Bulk loading into Cassandra

I need to move millions of rows from a MSSQL server to Cassandra and i'm using the tCassandraOutputBulkExec. It works fine for thousands of records, but as soon as the number of rows reaches the hundreds of thousands, the job will start to slow down due to no garbage collection. 
Is there a way to periodically generate the sstable and garbage collect the uneeded rows? Should I use tCassandraOutput? 

Re: Bulk loading into Cassandra

Is there any error message printed on console when your job start to slow down? Could you please show us your tCassandraOutputBulkExec component setting screenshot?
Best regards
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Five Stars

Re: Bulk loading into Cassandra

Hey! To put it simply, the garbage collector runs out of memory. My guess is the component tries to create one big massive sstable while holding all the data in memory. It must be said, that the table that I'am importing has around 38 columns. I've included screenshots of a successful run, the error and the component settings.
When using a week's worth of data, the job completes successfuly, but it fails when loading more than one month. 
One Star

Re: Bulk loading into Cassandra

In place of doing Million rows in 1 go, can you try to break them in Chunks and then load? You can create a subjob for processing chunks and give an explicit GC clean command using tJava.
Five Stars

Re: Bulk loading into Cassandra

Hey, I'll probably go that route.
Thanks for the awnsers.

Calling Talend Open Studio Users

The first 100 community members completing the Open Studio survey win a $10 gift voucher.

Start the survey


Talend named a Leader.

Get your copy


Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences


Talend Integration with Databricks

Take a look at this video about Talend Integration with Databricks

Watch Now