Five Stars

Bulk loading into Cassandra

Hello,
I need to move millions of rows from a MSSQL server to Cassandra and i'm using the tCassandraOutputBulkExec. It works fine for thousands of records, but as soon as the number of rows reaches the hundreds of thousands, the job will start to slow down due to no garbage collection. 
Is there a way to periodically generate the sstable and garbage collect the uneeded rows? Should I use tCassandraOutput? 
4 REPLIES
Moderator

Re: Bulk loading into Cassandra

Hi,
Is there any error message printed on console when your job start to slow down? Could you please show us your tCassandraOutputBulkExec component setting screenshot?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Five Stars

Re: Bulk loading into Cassandra

Hey! To put it simply, the garbage collector runs out of memory. My guess is the component tries to create one big massive sstable while holding all the data in memory. It must be said, that the table that I'am importing has around 38 columns. I've included screenshots of a successful run, the error and the component settings.
When using a week's worth of data, the job completes successfuly, but it fails when loading more than one month. 
One Star

Re: Bulk loading into Cassandra

In place of doing Million rows in 1 go, can you try to break them in Chunks and then load? You can create a subjob for processing chunks and give an explicit GC clean command using tJava.
Thanks,
Sankalp
+919811103231
Five Stars

Re: Bulk loading into Cassandra

Hey, I'll probably go that route.
Thanks for the awnsers.