Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

Six Stars

Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

Hi All,

Need you expert help.

 

The requirement is to pull all chat data from REST API (one time full data dump) and then pull chat on daily basis.The output is spread across 180K pages with each page giving URL to next and previous (except first page which have only 'nex_url' and last page with have only 'prev_url').

 

So Far I have been able to use the API/URL to extract information from first page first page

 

tRestClient ->tJavaRow->tJsonExtract->tOracleOut

 

How do I modify the job to

1) Pull all data for one time data dump, 180k pages

2) Pull data on daily basis for current day or extract data until the timestamp is current day.

 

Example output from API

 

Page 1 gives
{
    "chats": [

                  all chat related attributes that needs to imported

                 ],
    "count": 179451,
    "next_url": "next_url_here"

}


Page2 gives

{
    "chats": [

                all chat related attributes that needs to imported

                 ],
    "count": 179451
    "prev_url": "previous_url_here"
    "next_url": "next_url_here"

}

Page 3 gives ......next page 

 

 


Accepted Solutions
Highlighted
Sixteen Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

You need to have this sort of layout...

layout.pngYou set the initial globalMap in the "Set initial globalMap" tJava. Then set the "Where" clause logic in the tLoop component. The "Dummy" component is just to allow you to link to the tRestClient. I've included the "Modify JSON" tJavaFlex following on from your last question. Then you can set the next url in the "Set globalMap" tJavaFlex. 

Six Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

Yes this did work for me, the final solution looks like this.

image.png

 

 

 

 

 

 

Global Variable 

"V_API_URL" - Holds initial URL and then is updated with next URL that I get from tExtractJson.

"V_LOOP" - Defined Boolean, hold value true to start with and is set to false when the  V_API_URL is null 

 

TLoop

Used While loop, without Declaration and Iteration
image.png

 

 

 

 

 

 

tRestClient

image.png

 

 

 

 

 

 

 

 

tJavaFlex

 

For each loop Global Variable V_API_URL is set to the next_url got from tJsonExtract

Check for if the V_API_URL has any url or not, if it does the loop continues as the value for the V_LOOP is still true and if there is no url then V_LOOP is set to False. This would exit the loop when it's end page.

image.png

 

 

 

 

 

 


All Replies
Sixteen Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

My assumption is that the "next_url" element will not be supplied if there are no other pages after that one. If that is the case, you can do it like this.....

 

1) Set up a tLoop using the "while" loop functionality. Use a globalMap variable holding your initial URL (set in a tJava preceding the tLoop) as your test on your while clause. "While globalMap value is not null" for example.

2) Use the globalMap value in your tRestClient

3) Retrieve your data for each service call and also retrieve the next_url. Set the globalMap value to be that of the next_url. If it is not present, then this will be null.

 

The tLoop will fire for each url supplied and will stop when the next_url value is not supplied.

Six Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

I am trying to do what you suggested, I am quite new to Talend hence sometimes it's bit difficult to achieve small and simple things as well.

 

I used tsetglobal to set the initial URL and then passed it to tRestClient. Then extracting the 'next_url' from tExtractJson and till here things are good. I looked up result in tLogRow and can see the next_url. However I am not able to ficure out how to assign the next_url from tExtractJson to a global variable in tJava. 

 

image.png

 

 

 

image.png

 

 

 

 

 

 



image.png

 

 

image.pngimage.png

Highlighted
Sixteen Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

You need to have this sort of layout...

layout.pngYou set the initial globalMap in the "Set initial globalMap" tJava. Then set the "Where" clause logic in the tLoop component. The "Dummy" component is just to allow you to link to the tRestClient. I've included the "Modify JSON" tJavaFlex following on from your last question. Then you can set the next url in the "Set globalMap" tJavaFlex. 

Six Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

Hi rhall_2_0, thank you for replying.

I am unable to get the attribute value from tExtractJson to tJava. In this case I am extracting "next_url" in tExtractJson but when I link it to tJava I have no clue how to assign globalMap variable. Below is the code in the tJava, this is giving error 'input_row cannot be resolved'

globalMap.put("next_url",input_row.next_url);
System.out.println("Value Of GlobalVar: "+globalMap.get("next_url"));
Sixteen Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

LInk the tExtractJson component to a tJavaFlex not a tJava. The code for the tJavaFlex should be in the Main Code section. 

Six Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

thank you for replying. The below mentioned code in tJavaFlex is giving error

globalMap.put("v_next_url",input_row.next_url);

'input_row cannot be resolved'

tExtractJson to tJavaFlex (what code I have to write in tJAvaFlex to get next_url value from tExtractJson to a global variable in tJavaFlex. The one I somehow manage to find from internet doesn't work :-( )
Six Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

resolved it, had to use
globalMap.put("V_API_URL",row2.next_url);
instead of
globalMap.put("v_next_url",input_row.next_url);

row2 is the output from tJsonExtract
Sixteen Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

Ah yes, sorry I have been away from my machine all day. The tJavaRow uses the input_row and output_row row names ( for some strange reason) the tJavaFlex uses the actual row names.

 

Did this work for you?

 

Six Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

Yes this did work for me, the final solution looks like this.

image.png

 

 

 

 

 

 

Global Variable 

"V_API_URL" - Holds initial URL and then is updated with next URL that I get from tExtractJson.

"V_LOOP" - Defined Boolean, hold value true to start with and is set to false when the  V_API_URL is null 

 

TLoop

Used While loop, without Declaration and Iteration
image.png

 

 

 

 

 

 

tRestClient

image.png

 

 

 

 

 

 

 

 

tJavaFlex

 

For each loop Global Variable V_API_URL is set to the next_url got from tJsonExtract

Check for if the V_API_URL has any url or not, if it does the loop continues as the value for the V_LOOP is still true and if there is no url then V_LOOP is set to False. This would exit the loop when it's end page.

image.png

 

 

 

 

 

 

Sixteen Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

Glad you got it to work and a good explanation of the steps you took. However, it is good form to award the "Accepted Solution" to those who have provided you with the answer. In this case, to use the tLoop and a globalMap to hold the value of the next url. You can award several "Accepted Solutions" if you feel multiple people have contributed towards it (you may feel that you deserve some credit for writing this up, and I would agree), but it is a little frustrating when you spend time building an example solution only to have it used but not accepted. 

Six Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

My apologies rhall_2_0, accepting solution to my post was not intended towards taking any credit. The context was more towards summarizing everything and accepting it as a solution so that it would help others to quickly find the solution rather than reading whole conversation. I wouldn’t have had solution to my problem if you didn’t help. This is the second problem you helped me with consecutively and I am truly greatful to get help and support from an expert like yourself.
Six Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

Also I wasn’t aware that multiple answers can selected as accepted solution 😕
Sixteen Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

No problem at all and thanks for sorting it. There are plenty of people who just leave after having their questions answered and it can get a little irritating when all it takes is a click on "Solution Accepted". I happen to use this forum as a method of promoting my business. So "Kudos" and "Accepted Solutions" are of value. Once again, thanks for sorting this and thanks for writing your solution up to demonstrate to others who might have the same question :-) 

Five Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

Just wanted to thank both of you guys. I was also in a similar problem and after modifying your solution it works for me up to some extent. Will post if there is something i need help with

Eight Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

Hi @gr44 and @rhall_2_0:

 

I am trying to follow what exactly you had been doing but I seem to stuck at TRestClient where it keeps giving me nullpointerexception Java error.

 

Not sure if the problem is with this component though as I can see that in Tjava component I had println but nothing is getting printed there.

 

Screen Shot 2018-11-02 at 9.28.10 am.png

 

Is there anything I should be looking at?

 

In Tjava I got below code:

 

globalMap.put("VApiUrl",row4.NextUrl);

 System.out.println("Value of Globar Var: "+globalMap.get("VApiUrl"));

 

In TrestClient I have got config like this:

 

((String)globalMap.get("VApiUrl"))

 

Can you please share your job code if possible?

 

Thanks

Harshal.

Eight Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

Hi guys

 

Would really appreciate if you can respond to this or share some more details on your solution please.

 

Thanks

Harshal.

Six Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

Hi @Parikhharshal,

 

As far as I can see with the screenshot you posted, you have not assigned any value to the global variable. If you are using a global variable in the URL for the tRestClient_1 then make sure that the value is assigned in tSetGlobalVar_1

Sixteen Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

I suspect that @gr44 is correct. The process should be.....

 

1) Set the first URL in the tSetGlobalVar_1

2) The first iteration of the loop will use the URL set in step 1

3) The JSON returned is then inspected for your data and another URL

4) The tJavaFlex at the end will set a new value to the globalMap variable for the URL

5) The loop goes to the next iteration, back to step 2

Eight Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

Hi @rhall_2_0 and @gr44

 

The way you guys did it, I am following that but not sure what I have missed out.

 

This is what I have set in global var.

 

Screen Shot 2018-11-02 at 11.30.05 pm.png

This is what set in tloop.

 

Screen Shot 2018-11-02 at 11.31.18 pm.png

 

 

This is what mentioned in tjava.

 

Screen Shot 2018-11-02 at 11.32.06 pm.png

 

This is what is mentioned in trestclient

 

Screen Shot 2018-11-02 at 11.32.57 pm.png

 

Am I missing anything or doing anything wrong here?

 

Guys....As mentioned if you could share screenshot of these values that would be fantastic!

 

p.s.: You guys are awesome as always and thanks a lot for replying Smiley Happy.

 

Thanks

Harshal.

Six Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

@Parikhharshal

 

Few points

1) For "As mentioned if you could share screenshot of these values that would be fantastic" - The cope/screenshot I uploaded are the actual values that I am using 

https://community.talend.com/t5/Design-and-Development/Iterative-Data-extraction-Pagination-and-Poll...

 

2) Break down the job and try to execute standalone components to identify if they are working as expected. For example 

   A) Deactivate all components except tRestClient_1 and manually put the URL what you have set in global variable in  URL of tRestClient and check if the        component is successfully connecting to source.

   B) deactivate everything except tSetGlobalVar_1 and tRestClient_1 and see is this is working. This is one of the most important step as it will validate if the URL passed from tSetGlobalVar_1 is causing any issue to tRestClient_1

   and so on...

3) I see that you have put the code for assigning next URL in tJava_1. You should move it to tJavaFlex_1 

 

Breakdown the solution and run individual components or combination of components to identify what is working correctly and what's not.

 

Eight Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

@gr44: I think there is a disconnect there. What is Tjava component doing in your job? Is it just dummy one?

 

If that's the case then I might add the below code in tjavaflex as mentioned in your solution:

 

globalMap.put("VApiUrl",row4.NextUrl);

 

if  ((globalMap.get("VApiUrl"))==null)

    {

    globalMap.put("VLoop", false);

    };

Eight Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

@gr44: I did this change and I was able to run the job but it didn't return anything at all and just ran. Bit weird though.

 

Screen Shot 2018-11-03 at 12.44.51 am.png

 

 

Six Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

@Parikhharshal did you configure the tExtractJSONFields properly. You should have a key/attribute in your JSON output which contains the URL for next page. For me the key/attribute/property name "next_url" so I extracted that using tExtractJSONField and used it further. The solution will not work if fields are not properly configured in the tExtractJSONField

Eight Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

@gr44: I was about to reply on the same. Seems I am not configuring them properly?

 

This is what I have got in field mapping and not having the fields which are required for output. Do you manually add fields in mapping here?

 

Screen Shot 2018-11-03 at 12.55.10 am.png 

Six Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

 

The fields in the json are not directly available as column here. You have to use the json query to extract the required information from incoming JSON string.

1) Set the JSON Field to 'String'

2) Edit Schema and add the column_name that you want to extract. Column names can be whatever you want, you will be using these later 

3) Once you create columns in the edit schema, they will appear in the mapping pallet as shown below.

4) use proper json query for each column, json query will define what value the columns will hold.

 

image.png

 

 

 

 

 

 

 

 

 

 

 

 

if you are not familiar with JSON structure and JSON query I suggest you read 

https://goessner.net/articles/JsonPath/

 

I used this to understand structure of my json output

http://jsonpath.com/

 

in the very beginning I used postman app in google chrome to see the output of my REST call,  understood the output and identified the properties I was interested. Then created the json queries there and tested the same in Talend. 

Eight Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

@gr44: Thanks a lot for for your explanation. One last question hopefully.....In my case I have for many fields to be extracted from jason input. However, in this case I still need to keep next_url as a column right otherwise there is no way I can pass next_url. May be I did not understand well though you explained the use of next_url in your case.

Six Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

Yes, you need to extract the attribute from JSON output containing the URL for next page. In my json output the URL to go to next page was called "next_url" and I created the column with the same name.

Eight Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

@gr44: In my case, for going to next URL is going to be per_page entry ie 100 and no. of page=2 in this example and also student_id which I need to populate dynamically. https://abc.com/api/v1/courses/160/analytics/student_summaries?per_page=100&page=2

 

And apart from this other attributes not for URL but writing data to DB. Does it mean I will have to capture number of pages and student id as column?

 

What about the other columns which exist at source which needs to be written to DB?

Sixteen Stars

Re: Iterative Data extraction (Pagination and Polling) from REST API using tRestClient

I'm not sure I understand the parameters. What is the difference between "per_page" and "page"?

 

To write other attributes back to a database, you have several options. You can dump the entire JSON to a tHashOutput (set to append mode) and then read it all back and interrogate it from a tHashInput component linked to it. This will be a subsequent subjob and may be easier to understand for a future developer to inherit.

 

Alternatively, you could use the tExtractJSONFields to extract all of the data you wish to extract and write this straight to your DB or (similar to the above) write it to a tHashOutput and deal with loading it to your DB in a later subjob.

Tutorial

Introduction to Talend Open Studio for Data Integration.

Definitive Guide to Data Integration

Practical steps to developing your data integration strategy.

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.