API URL null

Eight Stars

API URL null

 Hi Talend experts


I have got below job which reads next API URL until it finds one also it iterates for different course_ids. I have tweaked the design a bit to run parallel execution(the idea is to use parallel execution option of iterate) to execute multiple calls at a time.

Screen Shot 2018-11-14 at 4.04.35 pm.png

 

StoreCourseID:
//globalMap.put("canvas_id", row1.canvas_id);
globalMap.put("V_API_URL" + row1.canvas_id, "https://swinburneonline.instructure.com/api/v1/courses/"+row1.canvas_id +"/analytics/student_summaries?per_page=100");
globalMap.put("V_LOOP"+ row1.canvas_id, true);

tLoop:
((Boolean) globalMap.get("V_LOOP"+ row1.canvas_id))

tRestClient:

((String) globalMap.get("V_API_URL" + row1.canvas_id))

GetNextUrl
System.out.println("Current URL IS: "+globalMap.get("V_API_URL"+ row1.canvas_id)); (prints correctly)
System.out.println("Rest URL"+ globalMap.get("tRESTClient_1_HEADERS")); (doesn't print and errors out to null)
java.util.List <STRING> strList&nbsp; = ((java.util.Map<STRING>&gtSmiley WinkglobalMap.get("tRESTClient_1_HEADERS")).get("Link");

 

SetNextURlToCurrUrl:

if ((Boolean) globalMap.get("V_LOOP"+row1.canvas_id))
; {
System.out.println("URL1 IS: "+globalMap.get("V_API_URL"));
globalMap.put("V_API_URL"+row1.canvas_id,globalMap.get("next_url"+row1.canvas_id));
System.out.println("URL IS: "+globalMap.get("V_API_URL"));

 

In doing so I have run into problem where Rest URL is always getting NULL when I get it from GetNextURL (tjava) component. Not sure what's wrong. Any help is really appreciated!

 

@rhall_2_0 and @gr44: your input is really appreciated!!


Thanks
Harshal.


Accepted Solutions
Sixteen Stars

Re: API URL null

OK. Here is a way that might help. Put the part of the job which iterates over the web service calls in a separate job. That job should receive a context variable which holds the course_id. Keep in your current job the part of the process which iterates over the course_ids. Then use your new child job to be called by the iterate link supply the course_id. You can then try and execute that child job in parallel.

 

By  doing this you are keeping the individual course queries in the same process. Therefore your "next_url" functionality will carry on the way it is.


All Replies
Sixteen Stars

Re: API URL null

If you are intending to use parallel execution you need to REALLY understand what you are doing. It won't gain you much here. I suggest you don't use it. Your problem is likely caused by the use of ....

globalMap.get("tRESTClient_1_HEADERS")

This is a single instance of an object that you are using in parallel. As such you have no idea which parallel flow the result will correspond to. 

Eight Stars

Re: API URL null

@rhall_2_0: How Do I do parallel execution to make entire flow faster? Just imagine I have got 5k course id read from fb and iterated to rest api URL and on top pagination happens. So this is making flow really slow. I am able to run only 2k course id for almost 2.5 hours as call happens for each course id and each page inside it. Let alone be remaining ~3k records.

There must be some way of improving performance for current flow. How do I parallelise then for multiple course ids together?
Sixteen Stars

Re: API URL null

OK. Here is a way that might help. Put the part of the job which iterates over the web service calls in a separate job. That job should receive a context variable which holds the course_id. Keep in your current job the part of the process which iterates over the course_ids. Then use your new child job to be called by the iterate link supply the course_id. You can then try and execute that child job in parallel.

 

By  doing this you are keeping the individual course queries in the same process. Therefore your "next_url" functionality will carry on the way it is.

Eight Stars

Re: API URL null

@rhall_2_0: Thanks for your reply. Not sure if I understood your problem but if you could just show design step here that would be fantastic. But I tried doing context way but it didn’t work as in context it can store one value at a time and I want parallelism(many ids being passed to many flow) to be happening at the same time.
Eight Stars

Re: API URL null

@rhall_2_0: Sorry for the earlier post. I did not quite understand. Then I implemented the way you mentioned and it is working.

 

Screen Shot 2018-11-16 at 10.36.28 am.png

 

It works fine.

 

Not sure what's the best value to parallelise. I am going to test out for 50 and see how it goes.

Sixteen Stars

Re: API URL null

Make sure you test this thoroughly. You *may* find some timing issues, but this is a better way of attempting this.

Eight Stars

Re: API URL null

@rhall_2_0: I found out that there is throttle value set at source and I can’t do parallelism. So per token value I have certain resourcing limit. This is going to slow down everything 😕. However I have asked application team to see if at all I can set parallelism and what’s the best value I can have.

Yes you are right I had timing issues when I was testing for 50 or 100 parallel execution. How to tackle them?
Sixteen Stars

Re: API URL null

I've just carried out a quick test and I don't see this issue in v6.5.1. What version are you using?

Eight Stars

Re: API URL null

@rhall_2_0: I’m using 6.4.1. How to avoid that type of situation?
Sixteen Stars

Re: API URL null

I'm not sure. This *might* be a bug. I built a quick job like this....

 

tRowGenerator -------------------------------------> tFlowToIterate-------------------------> tJava

(Generating a single integer sequence) .       (Running 10 in parallel)                  (Printing the numbers)

 

This produced an output of .....

 

3

9

1

4

10

7

8

2

6

5

 

I extended the test to 100 with 100  in parallel and it still looked ok.....although I didn't spend too much time checking.

But I did this in v6.5.1. Try something similar and see if you get problems. If you do, it sounds like a bug in v6.4

Tutorial

Introduction to Talend Open Studio for Data Integration.

Definitive Guide to Data Integration

Practical steps to developing your data integration strategy.

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.