It is possible to connect two component such as tFilter and tSort ? Which would be the output? 1) I mean: does tFilter wait to have parsed all rows in input and then starts to send them to tSort; or for every row it parses, it sends the row to tSort asap? If this happens then it will be useless to try to sort this way. Or not? And what happens if i put a tOutputFile in between? Does tOutputFile can be linked to a 'needing-input-component'?Does tOutputFile wait to write all rows and then sends them to next component? 2) And the converse works? I mean: what happens if i reverse the order and use tSort and then tFilter? I think in this way it should work, whatever is the behaviour of the component: the first row to be outputted from tSort is (obviously) in rigth order and can be filtered. Anyway i would like to better know the behaviour of components.
1 ) tFilter doesn't wait, it works row by row. You can sort before or after, this will have no consequence (except if you use variables changing from one row to another in the tFilterrow advanced box). Putting a tFileOutput in between will work the best way you can imagine : can be linked to another component, works row by row. 2 ) I've answered my best in 1. I don't know about the tSort component which I never use, but I do have one question : Why on earth don't you try ??
>Why on earth don't you try ?? Ehm, i *am* doing a lot of trials. The fact that i also want to know the theory behind, is because i belive i can learn quickly if i ask things, instead of deducting everything from component behaviour. Thank you for answering. So you said tFilter doesn't wait, and this fact does not effet the tSort which can be linked either before or after: it is the same. So i realize that tSort must have all rows stored in RAM to sort them. I was wondering if that should be a problem for larg amount of data.
Hi, to learn more about the components, from my point of view, the best way is to take a look in the generated code. ,-) If you run in problems with tSort (because of the amount of used memory) you can take a look on the advanced options "sort on disk". Generally the best way is to reduce the number of rows as soon as possible in your job (especially if you have many rows). Bye Volker
tFileOutput* do not modify the data flow and you can have a row link as output. By default, tSortRow store all rows in memory, sort them and output them in a row link. On a Java project, you can use the "sort on disk" option. You can also use tExternalSortRow, the memory usage will be managed by the GNU sort program (and you can set the maximum of memory to use).
Thank you. Data i have to migrate are stored on excel files of about 30 Mb in size. I am still trying to find a good way to process them: most operations i have to do are filtering and sorting before inserting into db. Is it ok if instead of reducing the number of rows, i reduce number of colums? I was thinking to split them with tFilter*Column* in order to have columns with semantics relationship all together, in order to not to have too much trubles when merging again on database. Well, i definitly will have a look at the code.
Hi, as I said in a my last post. Every chance to reduce rows or columns could be have a positive impact on speed and memory usage. But the details are depending on your data and the job design. Worst case an additional component need more resources than you will save. If you anyway have to filter and to sort first do filtering and then sorting should be the best. But again, the best way to answer it is to test your job. Bye Volker