Hi! I am trying to sort about 90K rows coming from a multisheet excel file. If I use Excel Input and Sort (I am sorting on 4 keys out of about 10 columns), tSortRow runs very slow (2 rows/sec). If I remove the excel file and put a CSV file created from the same excel, the performance improves many folds. A workaround to this is, of course, to convert this XL file to CSV and then use that as an input. I am just trying to figure out why the response for tSortRow be different for different inputs. Note that I am using the new XL enhancement that allows me to read multiple worksheets without using tUnite. Thanks. Regards, Sean
Additional information for the above query: I tried running a job that extracted XL to the CSV and in a sub job (on job OK), I read from the same CSV that was just created. The result is the same low sorter speed. When I just use the CSV file in a standalone job, I get very fast speeds. My workaround right now is to have one job to create the CSV and another one to read the CSV and do the processing. I am trying to keep everything to one PERL script but this will result into at least 2 scripts. Thanks Sean
Hi Sean, Have you tried with tExternalSortRow ? This component writes the incoming data flow to a temporary file before sorting it with gnu sort. It avoids data bufferization and thus reduces memory consumption. Hope it helps. Richard
I think there is no other problem that the way the "real time statistics" performance rate (rows/s above each row link) is calculated. Don't take it into account, just read the total execution time and you'll see the input doesn't affect tSortRow performances. Talend Open Studio generated code model was designed so that components are independant.
Well. You are right. I am doing something wrong to slow this job down. When I created a simple job with EXCEL input, a sort and then the CSV output, it was fast. So I need to look more as to why that particular job is slowing down. I'll get back to you on that. Regards, Sean