please see the screenshot, it is taking too much time to process just 1 million rows at the rate of 22 rows/second... what could be the reason? tr_incomingtransaction = main input, three columns: ID, HostURL, RefererURL tglobalvar = saving these three into global variables to use them later for splitting etc... tExtractRegexFields = splits URL into domain name more or less the same case with other components.
Try the below approach: First you have to find out the component which is causing the issue. You can try this by deactivating the individual compoents and probably writing to tfileoutputdelimited. Once you know the component which is causing the issue, we should work on it.
Possibly the problem is that each row5 is triggering the row8 lookup. I'd start by deactivating the second subjob. Your whole job design looks unusual, like it could do with some significant optimisation. Some comments: 1. The data_warehouse connection component should be connected to tr_incoming_transaction not dim_searchengine. 2. tJava_2 should not have rows connected to it. 3. tSetGlobalVar_2 should be redundant as tFlowToIterate_1 populates global variables. 4. Similarly, what tSetGlobalVar_3 does could be done in tJavaRow_1.