One Star

Enable Parallel Execute is Greyed Out

Hi,
I am using Talend Enterprise Data Integration Version: 5.5.1 Build id: r118616 and wanted to process a delimited file in parallel. However, the enable parallel execution for the component tFileInputDelimited is greyed out. Is there anything that I need to do to so that I can enable this option?
Regards.
Allan
15 REPLIES
Six Stars

Re: Enable Parallel Execute is Greyed Out

I'm not sure I understand. What parallel execution? Can you post a screenshot of your job?
Moderator

Re: Enable Parallel Execute is Greyed Out

Hi,
So far, enable parallel execution feature is not available in 5.5.1.
Could you please take a look at component TalendHelpCenter:tParallelize which allows you to synchronize the execution of a subjob with the execution of other subjobs in your main Job.(It is available in Talend Enterprise Subscription Version).
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Enable Parallel Execute is Greyed Out

Hi jholman,
Please see below.


Hi Sabrina,
Is that enabled in other versions? If so which versions is it enabled. As much as I would wanted to use tParallelize, I have to admit that I am quite new to Talend and would rather not manually do threading if I have the choice.
Thanks and warm regards.
Allan
One Star

Re: Enable Parallel Execute is Greyed Out

Hi jholman,
Apologies, it seems I am not allowed to post images or URLs. I know you will be able to make sense of the below.
//www.talendforge.org/forum/img/members/243775/mini_enable_parallel_greyed_out.png
Regards.
Allan
Six Stars

Re: Enable Parallel Execute is Greyed Out

Hi Allan, the image is too tiny!
One Star

Re: Enable Parallel Execute is Greyed Out

Apologies, here is the full sized image. The checkbox is in the bottom and highlighted in yellow.
//www.talendforge.org/forum/img/members/243775/enable_parallel_greyed_out.png
Regards.
Allan
Moderator

Re: Enable Parallel Execute is Greyed Out

Hi,
So far, we don't support the function "Enable parallel execution" in advanced setting of tfileinputdelimited in Talend.
Could you please give us more description about your job requirment? Is there any problem when you use tParallelize? Do you want to use multi thread execution?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Enable Parallel Execute is Greyed Out

Hi Sabrina,
We require processing of very large files (trades, orders, etc.) and I want to have multi threaded specific data flows in the job (i.e. enrichment, etc.). So unless I am missing something, tParallelize is not really a solution for me. What I am looking for is something like a:
tFileInputDelimited -> tPartitioner -> tCollector -> (some random transformation component like tMap) -> tDepartitioner -> tRecollector -> (some additional transformation) -> (load to database)
I have read in one of Talend's articles (//help.talend.com/display/KB/How+to+automatically+enable+parallelization+of+data+flows+for+better+performance) that this is supposed to be possible. 
Regards.
Allan
Six Stars

Re: Enable Parallel Execute is Greyed Out

The last time I looked the tFileInput components use a blocking i/o library in Java so having multiple threads read the file is not really possible. You can always just split your files and the have multiple readers read each chunk in parallel. I'm not exactly how to implement what your asking for but I will try to summon RBaldwin to this thread, if any one can answer he can since he wrote all the MPP components.  Smiley Happy
One Star

Re: Enable Parallel Execute is Greyed Out

Sorry for the delayed reply. Was in transit from US to Singapore Smiley Happy. Would really love to get rbaldwin's input on this. And just to give you a better idea of what we are working on, we are currently working on a dwh and some of the files are just too large (20 - 30GB text files on daily basis) that reading and processing them in a single thread is not going to be an option.
Moderator

Re: Enable Parallel Execute is Greyed Out

Hi,
The KB article: TalendHelpCenter:How to automatically enable parallelization of data flows for better performance
"Set Parallelization "feature explained in this section is available only on the condition that you have subscribed to one of the Talend Platform solutions or Big Data solutions V5.3.1 or later.

Can you get this feature in your Talend Enterprise Data Integration Version: 5.5.1 Build id: r118616?
Could you please show us your current job design screenshot?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Enable Parallel Execute is Greyed Out

Hi Sabrina,
Please see below.

(www.talendforge.org/forum/img/members/243775/jobflow.png)
Unfortunately, the parallelization option is not available for my version. So I guess the next thing to ask which specific product should we buy to get this option. Would also appreciate if you can get someone to send a price matrix for what you are offering. I have contacted Drew James from your UK office but I am not getting any answers.
Thanks and warm regards.
Allan
Moderator

Re: Enable Parallel Execute is Greyed Out

Hi,
For enterprise subscription product price matrix, you have to send an email to talend sale team.
There is no any reference about that in our side.
What's your job rate(rows/s)? Does it take a long time for you? Have you tried to break it into several subjobs with multi threade?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Enable Parallel Execute is Greyed Out

Hi Sabrina,
We are currently at the stage of selecting the tools that we are going to use and building PoCs for each of those tools. As such the ETL routines are quite simple at the moment and the throughput are quite acceptable. However, getting several 15GB files go through the actual process of going through a series of transformations (i.e. data cleanup, lookups, joins, etc.) is going to be very different. And to give everyone a better idea of what we are doing, we have a requirement to be able to do 250,000 inserts into a database per second. While that requirement did not require us to do a lot of complex transformation that would have required us to use an ETL tool, it can definitely change in the future. As such we are looking for an ETL tool that is able to provide the following:
1. Ability to read a file in parallel. A better explanation of this can be found at (doc.cloveretl.com/documentation/UserGuide/index.jsp?topic=/com.cloveretl.gui.docs/docs/parallelreader.html) (optional)
2. Ability to partition data and process them in parallel. A "proper" MPP support would be good but SMP would be fine as well. (must have)
3. In-memory lookup, in-memory aggregation. (must have)
While jholman suggestion of splitting the files in parallel and then processing them in parallel will definitely work, please do take note that splitting a file into several pieces takes time as well. And I would not want to have to keep on manually design parallel handling for each file that we are going to process.
The problem is when I do a search, I get to see Talend performing exactly what I am looking for, but for some reason I am not able to replicate using my version. Let's take the article help.talend.com/pages/viewpage.action?pageId=3986800#Raa92445, search for "Iterate connection settings". As you can see in the screenshot, there is an "Enable parallel execution" and you are able to define the number of parallel executions that you want. You are also able to see in the second screenshot that each execution did around 70k rows each. But when I look at my tLoop, I don't even see the "Enable parallel execution" checkbox.
So my question is, does Talend support the above features that we are looking for? if yes, can you provide a link to the relevant documentation?
Thanks and would really appreciate clarification in this regards.
Allan
One Star

Re: Enable Parallel Execute is Greyed Out

Some additional materials. Please take note that I am not in anyway leaning or have a reason to a specific product at the moment. But hopefully the below link would clarify some of what I am requiring.
https://cloveretl.wordpress.com/2009/10/26/parallelreader-versus-competitors/
https://cloveretl.wordpress.com/2009/11/11/parallelreader-versus-competitors-part-2/
Thanks and warm regards.
Allan