I have a large database sql server and I would like to process the lines in slices.
For example for every 100,000 lines, I apply a process using a python script that I call with the component t_system.
How can I do it?
same as you will do in any language (for example SQL)
1. detect your id column (or any column which you could use for partitioning ), count number of rows
2. run you script with parameters - id > n * 100000 and id < (n+1)* 100000, where n from 0 to (number of rows / 100000)
3 increment n, n = n+1
4 do this until number of rows > n * 100000
Thank you for your reply.
If I understand correctly, I should translate this code into SQL and write it in the tmssql_input component then I use t_system to apply my script?
there are many ways todo this
as variant - Input component with select count(*) from table_name
store this value to global variable and create loop with Talend components
But first how can i get the id row because i don't have an auto increment id that i can use it for the condition (id>n*100000).
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Part 2 of a series on Context Variables
Learn how to do cool things with Context Variables
Find out how to migrate from one database to another using the Dynamic schema