Six Stars

Running a single job multiple times based on context variables

We have a situation here which requires inputs from you.
The requirement is that we have a Job A that needs to be run without talend. We are planning to build the job and run them using windows scheduler.
Actually, the job loads data into BigQuery for a Single provider (say Provider A). Now, if we there is another provider joining (say Provider B), what is the best way to run the same job for Provider B again? There will be no change in the requirement/job at all, the only place where we may have to change is the DataSet  name of the "BigQuery" which is different for each provider.

Each provider is independent of the other, so if possible running the same job for two different provider in parallel is also most welcome (if possible). We wanted to add the name of the Dataset into a flatfile and use it as a parameter (for getting the Data Set name) and load into the Bigquery table (based on the data set name). Can any of you provide me some idea on how to make this possible.

I know we can use context variable, but for my requirement, I need to pass the value for the dataset name using a parameter file name which will be different for each provider; how to make the 1st job point to 1st path and 2nd to point 2nd path. If not, is it possible to put both the dataset into a single parameter file and have the job to run twice based on the data set value.

Is there any other different way by which, I can run the same job many times based on the value provided as a parameter outside the job and the records should be loaded into the database based on the value provided in the Parameter (Dataset name)

  • Big Data
  • Data Integration
7 REPLIES
One Star

Re: Running a single job multiple times based on context variables

Hi

 You can use external context variable read. You can pass a parameter file and parse the params inside the Talend job .

 When you run the second instance you can pass second param file so that you can run with second set of parms.

HTH

Thanks

Raghu

Community Manager

Re: Running a single job multiple times based on context variables

Hello
As raghumreddy suggested, read the param value from the flatfile and pass it to the business job dynamically, for example:
You have a flatfile that has the dataset names:
dataset1
dataset2

tfileInputDelimited--main(row1)--tFlowToIterate--iterate--other components--main-->tBigQueryOutput

tfileInputDelimited: read the dataset name from flatfile, define one column called "dataset"with string type. In the later components, you can get the current dataset name with this expression:
(String)globalMap.get("row1.dataset")

tFlowToIterate: loop the business processing multiple time for each dataset name.

Regards
Shong

----------------------------------------------------------
Talend | Data Agility for Modern Business
Six Stars

Re: Running a single job multiple times based on context variables

Here is what it has been asked to check at implementation point of view.
Create a flat file with list of datasets. if in future any datasets to be added (for another provider, the addition should be done here with a number by which it is to be identified)
The flat file will look like
1. Dataset1/Provider A
2. Dataset2/Provider B
3. Dataset3/Provider C
44. Dataset4/Provider D
....
So while running the talend job, the job should ask,  for which dataset this run has to be done.The talend should give us the option something like this
Select the Dataset/Provider which  has to be executed.
1. Dataset1/Provider A
2. Dataset2/Provider B
3. Dataset3/Provider C
44. Dataset4/Provider D
...
...
99 Exit

If any datasets added in future in flat file, that datasets should be displayed while running the job automatically and the number for that dataset should be the number which is given in the flat file.

If the number 1 is pressed, then the Dataset1 should be executed and if number 44 then the Dataset 4 should be executed.
The number is based on the number that is given in the flat file and Exit should end the program without executing further.
Can you please suggest how to work on this scenario.Should be more useful for me now and everyone in future, if you can share the job that is relevant for this scenario.

Six Stars

Re: Running a single job multiple times based on context variables


shong wrote:
Hello
As raghumreddy suggested, read the param value from the flatfile and pass it to the business job dynamically, for example:
You have a flatfile that has the dataset names:
dataset1
dataset2

tfileInputDelimited--main(row1)--tFlowToIterate--iterate--other components--main-->tBigQueryOutput

tfileInputDelimited: read the dataset name from flatfile, define one column called "dataset"with string type. In the later components, you can get the current dataset name with this expression:
(String)globalMap.get("row1.dataset")

tFlowToIterate: loop the business processing multiple time for each dataset name.

Regards
Shong


Following is the error I am getting while trying to load the file as mentioned by you.
400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Invalid dataset ID \"\"Dataset_Dev\"\". Dataset IDs must be alphanumeric (plus underscores, dashes, and colons) and must be at most 1024 characters long.",
"reason" : "invalid"
} ]

Six Stars

Re: Running a single job multiple times based on context variables


sreenathtr wrote:

shong wrote:
Hello
As raghumreddy suggested, read the param value from the flatfile and pass it to the business job dynamically, for example:
You have a flatfile that has the dataset names:
dataset1
dataset2

tfileInputDelimited--main(row1)--tFlowToIterate--iterate--other components--main-->tBigQueryOutput

tfileInputDelimited: read the dataset name from flatfile, define one column called "dataset"with string type. In the later components, you can get the current dataset name with this expression:
(String)globalMap.get("row1.dataset")

tFlowToIterate: loop the business processing multiple time for each dataset name.

Regards
Shong


Following is the error I am getting while trying to load the file as mentioned by you.
400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Invalid dataset ID \"\"Dataset_Dev\"\". Dataset IDs must be alphanumeric (plus underscores, dashes, and colons) and must be at most 1024 characters long.",
"reason" : "invalid"
} ]


Got this problem resolved. The CSV options has to be enabled for getting this job to get loaded successfully.

If anyone can advice me of the previous implementation i asked for, it will be most useful. Thanks again.

Community Manager

Re: Running a single job multiple times based on context variables

Hello
You need to read the dataset from the flat file, and create a drop-down list that allows user to select a value from the list at runtime, the result chosen by the user will be assigned to a context variable. There was a KB article about how to create a drop-down list, but it is missing on our new Talend Help Center portal, I have contacted our Doc team to restore it, but it might need some times.

Regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
Six Stars

Re: Running a single job multiple times based on context variables

Hi Shong,

Thanks for the reply.

Instead, can you get me the sample job that basically has this primary function working which can be useful for everyone looking for a similar setup.

I am looking for something like

job 1    -----> job 2

(This job)     (running based on the flatfile value)

There is nothing much to be worked on Job 2, just populate the value that is extracted from the drop down value of job1.

Can you share your ideas please?