I am considering using ToS DI to transfer large amounts of data (1 million+ rows) into a production environment that serves thousands of clients. I am worried about server performance impact. Is there a way to throttle the tMSSqlOutput (or any other Output component for that matter) so I may limit the number of rows that are inserted/updated within a specific time frame? i.e. - 10,000 rows an hour.
Thanks in advance!
There are plenty of ways to achieve this, the best one really depends on what you are doing exactly. A REALLY simple method would be to just limit your job to processing 10,000 and then running it every hour. It is not really a good idea to have a job running for 100 hours in a single JVM. Don't get me wrong, you can do this....its just not all that bullet proof. You can also slow the actual load and commits down if you want to get a bit more technical....but it is probably best to start off with something a bit more basic which would suit your requirements.
On a related note, why are you concerned about 1 million+ rows on SQL Server? I don't know the size of your servers, but 1 millions records really shouldn't present you with much of a problem. I'm assuming that since 1 million rows seems large and you are talking about thousands of users, that it is for a limit number of time zones? In which case, could taking it down for a brief period make this even easier for you?
Thank you for the quick reply! I typically do take databases offline to do my ETLs (at the request of IT Operations). And I'm not really concerned with a 1M+ row bulk insert on a SQL Server (especially ours that is clustered with load balancing). I use ToS DI almost exclusively for my ETL work and have never had any performance issues with the bulk inserts. Nonetheless, there is another team in my company that is inquiring about my process. They are currently using an ETL app that they developed and they throttle it to about 10,000 records an hour (out of fear that anything more will bring their Production SQL Server to a grinding halt). So I just wanted to be able to tell them whether or not ToS can effectively throttle on output in case they wish to adopt ToS, but still want to throttle.
If you want you can send and commit one row per unit of time in milliseconds (assuming that the time to send and commit would be separated by that millisecond pause and not take place entirely within it), but that would be an incredibly inefficient way of going about things. Having said that, it sounds like this team you refer to are not concerned by that. If you want to do that all you need to do is define your per record pause time and iterate over your record set. To do that connect your DB input component to a tFlowToIterate. Each of your columns will have its own globalMap value. Then connect that to a tJava with ....
...for example. Then connect a tFixedFlowInput (with a column for each of your globalMap values), to your db output component.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Pick up some tips and tricks with Context Variables
Learn how media organizations have achieved success with Data Integration
Introduction to Talend Open Studio for Data Integration.