Dear Experts, Ok, I finally got Talend Big Data to do a little job, using the instructions from this blog: http://lucidwebdreams.wordpress.com/2014/07/24/import-data-into-neo4j-from-ms-sql-server-directly-us... Source: MYSQL Destination: NEO4J Same server. Although, this load was even simpler: just read a single table in MYSQL and for each row, create a node in NEO4J. No relationships created at all. See the picture. What I noticed is that the load was really slow. Only 11.33 rows per second. So, inserting just under 90K rows, took about 128 minutes, or, over two hours.
Using SQL Loader with Oracle 7 in the 90s, I would typically see around 100 rows per second. Is 11 rows per second typical? If not, how many rows per second does everyone else here typically see?
My Redhat server has dual 4-core Xeon processors (8 CPUs altogether) and 16 gigs of RAM. Before the load, I had already doubled the memory parameters found in the ini file. cat TOS_BD-linux-gtk-x86_64.ini -vmargs -Xms1024m -Xmx3072m -XX:MaxPermSize=1024m -Dfile.encoding=UTF-8 -Dorg.eclipse.swt.browser.XULRunnerPath=/usr/local/lib/xulrunner
Are there any other parameters for Talend that can be adjusted to increase the insert speed? Thanks a lot!
What I noticed is that the batches seemed to be about 10 rows in size. In the sense, that each second, the count of number of rows inserted was updated. If it was a bigger batch, that took longer, then I would expect the row count to change on a longer interval. I do know in Oracle, if you commit every row, or every few rows, the insert speed really slows down. If you increase the commit and batch size to 1K or 10K rows, the throughput will be faster. Is there any way to adjust the batch commit interval in Talend? Thanks