Using OAuth 2.0 with Talend to Access Google APIs

Highlighted
Community Manager

Using OAuth 2.0 with Talend to Access Google APIs

This tutorial was originally written in 2014 for my website. Although the Google screenshots may be slightly out of date and I wrote this using version 5.5.1 of Talend, you should still be able to make use of this. I have attached a copy of the job and a copy of the parameters file I used for this.

 

Google make use of the OAuth 2.0 protocol for authentication for (I think, but am happy to be corrected) all of their services. They do a pretty good job of describing the protocol. Once you have read through the documentation you should have a better idea of what you are doing, which should make this easier. The Google documentation is here.

 

The first thing that needs to be done is to create a Google Project. This is described below....

 

Create a Google Project

 

Creating a Google Project is pretty simple. First of all, you need to go to the Google Developers Console.

Once you have logged in, you should see a page with content as below....

GoogleOAuth1.png

To create a project, click on the "Create Project" button circled in red. A popup will appear for you to fill out the "Project Name" and "Project Id". It might be a good idea to leave the "Project-Id" as the randomly generated value you are given as none of the values I have ever changed it to have worked.

 

This tutorial will be built as if it is going to be used to access Google Drive. Accessing other Google tools is done in the same way. You need to specify which APIs you require when you register a project.

GoogleOAuth2.png

Ensure that the "I have read and agree..." tick box is ticked.....oh and remember to read it.

 

Then press the "Create" button. 

 

After a few seconds you will see the following screen appear....

GoogleOAuth3.png

Next we need to give the project access to certain APIs. In this case we will give the project access to Google Drive. To do this, click on the "APIs & auth" link (circled in red). When the tree expands, click on the "APIs link. The next screen will appear...

GoogleOAuth4.png

You may see some other APIs automatically selected. You can leave those or (as I have done) remove them. Then select the "Drive API" and "Drive SDK".....or whichever Google products you wish to use. As I said earlier, this tutorial is an example of how to access Google Drive. But it can be followed for giving access to any of the Google applications.

 

Once the APIs that are required have been selected, click on the "Credentials" link to see the screen below....

GoogleOAuth5.png

Here is where we create our OAuth 2.0 Client ID. To start the process click on the "Create new Client ID" button (circled in red). This will reveal the screen below....

GoogleOAuth6.png

In this example we are using the "Web application" application type. While this isn't necessarily the best type to choose for Talend, it doesn't have any limitations as to what can be used with it. It does mean that a user will need to log in the first time, but a "refresh token" can be used to ensure that future "access tokens" can be created from that. This is explained later when describing the Talend Job.

 

You can see that the "AUTHORIZED JAVASCRIPT ORIGINS" and "AUTHORIZED REDIRECT URI" both contain "http://localhost". We are not using any Javascript, so we don't need to worry too much about the "AUTHORIZED JAVASCRIPT ORIGINS". The "AUTHORIZED REDIRECT URI" is a URI that access tokens are sent back to. This is described later.

 

Once you click on the "Create Client ID" button, you will see the following section appear on the screen. This holds all of the details you will need for your Talend Job...

GoogleOAuth7.png

The "CLIENT ID" and "CLIENT SECRET" are required by the Talend Job in order to get refresh tokens and access tokens.

Now we can look at building the Talend Job.

 

The "Retrieve Google Access Code" Job

 

This Job is intended to be used as a child job by other Talend Jobs that require access to Google products. The purpose of this Job is simply to return an access token.

 

This Job isn't terribly complicated, but there are lots of IF Conditions to control the data flow. This is to accommodate several scenarios that might be hit when retrieving an access code. A screen shot of the Job can be seen below....

GoogleOAuth8.png

As explained earlier, this tutorial will not go into so much detail about configuring components. Each of the numbers in the screenshot above correspond to areas that need a bit of detail. If there is anything that you feel is not described adequately, please feel free to leave a comment or question below and I will get back to you.

 

1) Reading Context Variables

This subjob is used to read Context variables in from a flat file. It makes use of a tFileInputDelimited component and a tContextLoad component.

 

The tFileInputDelimited component makes use of a Context variable called "context_file" to point to the correct file location and requires a schema of two columns; "key" and "value".

 

This enables the population of the following Context variables which have been created for this Job....

Name Type Default Value
access_token String  
client_id String  
client_secret String  
context_file String "C:/GoogleDrive/Config/contextGoogle.csv"
redirect_uri String  
refresh_token String  
scope String  

 

The only Context variable that needs a default value is the "context_file" variable. The rest are handled within the file referenced by that variable.

 

2) "Access Token Empty And Refresh Token Not Empty" (Run If)

This "Run If" link tests to see if the "access_token" Context variable is empty and the "refresh_token" is not empty. If this test is "true", then the next phase is to generate an access token from the refresh token. The code used is below...

(context.access_token==null||context.access_token.compareToIgnoreCase("")==0)&&(context.refresh_token!=null&&context.refresh_token.compareToIgnoreCase("")!=0)

 

3) "Access Token Not Empty And Refresh Token Not Empty" (Run If)

This "Run If" link tests to see if the "access_token" Context variable is not empty and the "refresh_token" is not empty. If this test is "true", then the next phase is to test the access token. The code used is below...

(context.access_token!=null&&context.access_token.compareToIgnoreCase("")!=0)&&(context.refresh_token!=null||context.refresh_token.compareToIgnoreCase("")!=0)

 

4) "Access Token Empty And Refresh Token Empty" (Run If)

This "Run If" link tests to see if the "access_token" Context variable is empty and the "refresh_token" is empty. If this test is "true", then the next phase is to generate a new refresh token and access token. The code used is below...

(context.access_token==null||context.access_token.compareToIgnoreCase("")==0)&&(context.refresh_token==null||context.refresh_token.compareToIgnoreCase("")==0)

 

5) "Get Access Token using Refresh Token" (tRESTClient)

This component is used to retrieve an access token using an existing refresh token. This will only be carried out if the "Access Token Empty And Refresh Token Not Empty" Run If link condition is true.

 

To configure this component copy the configuration shown below. Ensure that the sections circled in red are set correctly. To add the "Query parameters" use the green plus symbol circled in red.

GoogleOAuth9.png

The values required can be seen above, but you can find them below so that you can copy and paste them.....

URL: "https://accounts.google.com/o/oauth2/token"

Name Value
"refresh_token" context.refresh_token
"client_id" context.client_id
"client_secret"  context.client_secret
"grant_type"  "refresh_token"

 

It should be noted that although we are receiving JSON back, this component will automatically convert it to a DOM document with the JSON wrapped with a "ROOT" element by default. 

 

6) "tExtractXMLField_2" (tExtractXMLField)

This component is used to retrieve the access token from the returned JSON string which has been converted to an XML document. The configuration of this component can be seen below.....

 GoogleOAuth10.png

An output schema is required. To set this up click on the "Edit schema" button circled in red. A single column called "access_token" is required.

Ensure that the areas circled in red are configured as seen above. 

 

The "XPath query" required for the column that is being output is "./access_token". 

 

7) "IF more than 0 rows from tLogRow 4" (Run If)

This "Run If" link tests to see if any rows have come from the tLogRow_4 component. This is done to prevent the code from following this path if no rows were output. The code used is below...

((Integer)globalMap.get("tLogRow_4_NB_LINE"))>0

 

8) "tJavaRow_2" (tJavaRow)

This component is used to take the "access_token" column from the previous component and set it as the current value of the Context variable "access_token". The code to do this is below....

String atoken = input_row.access_token;

context.setProperty("access_token", atoken);

 

The "input_row.access_token" bit of code represents the value coming in. The "context.setProperty(..." section assigns the "access_token" value.

 

9) "IF more than 0 rows from tLogRow 2" (Run If)

This "Run If" link tests to see if any rows have come from the tLogRow_2 component. This is done to prevent the code from following this path if no rows were output. The code used is below...

((Integer)globalMap.get("tLogRow_2_NB_LINE"))>0

 

10) "tJava_3" (tJava)

This component is used to reset the "access_token" and "refresh_token" to an empty string if trying to acquire an access token from the refresh token fails. It also points the user to where to go to revoke access to the Talend Job so that the process can be started again from scratch. This should not be a common situation, but needs to be handled. The code for this can be seen below....

context.setProperty("refresh_token", "");
context.setProperty("access_token", "");
System.out.println("The tokens do not exist. Revoke access using this URL https://accounts.google.com/b/0/IssuedAuthSubTokens and then run the job again");

 

11+18) Writing the Context variables to file

These subjobs are used to take the current values held by the Context variables and output them to the file that holds those values. The tContextDump component needs no configuration. The tFileOutputDelimited component needs basic configuration which can be seen below...

GoogleOAuth11.png

The "File Name" value is set to the "context_file" Context variable. This is the only Context variable with a default set in the Job.

The schema needs to be a copy of the tContextDump. This is achieved by clicking on the "Edit schema" button and copying the input schema to the output schema. 

 

12) "tJava_1" (tJava)

This component is used to build a URI to be sent to the user to place in a web browser. It is made up of several Context variables which must be set in the Context variable file. This URI is described by Google here.

String uri = "https://accounts.google.com/o/oauth2/auth?";
uri = uri + "scope="+ context.scope + "&";
uri = uri + "state=123456789qwertyui&";
uri = uri + "redirect_uri="+ context.redirect_uri + "&";
uri = uri + "response_type=code&";
uri = uri + "client_id=" + context.client_id + "&";
uri = uri + "approval_prompt=auto&";
uri = uri + "include_granted_scopes=true&";
uri = uri + "access_type=offline";

System.out.println(uri);

 

13) "tMsgBox_1" (tMsgBox)

This component is used to retrieve the value of the redirect URL that is returned after a successful authorisation via a web browser. This is demonstrated later. The configuration of this component can be seen below...

GoogleOAuth12.png

Ensure that the "Buttons" drop down is set as "Question".

 

14) "tJava_2" (tJava)

This component is used to receive the result from the tMsgBox component and extract the authorization code from it. This is used by the next component to authorise the request for an access token. This process is described by Google here.

The code used in this component is below....

String code = ((String)globalMap.get("tMsgBox_1_RESULT"));
code = code.substring(code.indexOf("code=")+5);
code = code.substring(0,code.indexOf("&"));
System.out.println(code); //can be removed if an output is not required
globalMap.put("code", code);

 
Due to post size limitations, this tutorial will continue in the comments section....

 

Community Manager

Re: Using OAuth 2.0 with Talend to Access Google APIs

...Continued

15) "Get Access Token and Refresh Token" (tRESTClient)

This component is used to retrieve an access token and refresh token using the authorisation code retrieved from the component before. 

To configure this component copy the configuration shown below. Ensure that the sections circled in red are set correctly. To add the "Query parameters" use the green plus symbol circled in red.

GoogleOAuth13.png

The values required can be seen above, but you can find them below so that you can copy and paste them.....

 

URL: "https://accounts.google.com/o/oauth2/token"

 

NameValue
"code"((String)globalMap.get("code"))
"client_id"context.client_id
"client_secret" context.client_secret
"redirect_uri" "http://localhost"
"grant_type" "authorization_code"

 

It should be noted that although we are receiving JSON back, this component will automatically convert it to a DOM document with the JSON wrapped with a "ROOT" element by default. 

 

16) "tExtractXMLField_1" (tExtractXMLField)

This component is used to retrieve the access token and refresh token from the returned JSON string which has been converted to an XML document. The configuration of this component can be seen below.....

 GoogleOAuth14.png

An output schema is required. To set this up click on the "Edit schema" button circled in red. Two columns called "access_token" and "refresh_token" are required.

Ensure that the areas circled in red are configured as seen above. 

The "XPath query" required for the access_token column is "./access_token".

The "XPath query" required for the refresh_token column is "./refresh_token". 

 

17) "tJavaRow_1" (tJavaRow)

This component is used to take the "access_token" and "refresh_token" column values from the previous component and set them as the current values of the Context variables "access_token" and "refresh_token". The code to do this is below....

String atoken = input_row.access_token;
String rtoken = input_row.refresh_token;

context.setProperty("access_token", atoken);
if(rtoken!=null){
    context.setProperty("refresh_token", rtoken);
}

 

The "input_row...." bits of code represent the values coming in. The "context.setProperty(..." sections assign the "access_token" and "refresh_token" values. 

 

An "IF Condition" is used to cover situations where a "refresh_token" is not received. This should not happen, but this code prevents the Job from falling over if it does. 

 

19) Read the newly set Context variables into the Job and output just the Access Token

This subjob is run at the end of the Job. It will always run, no matter which path the code has taken. It is used to return the access token that has been retrieved/generated. As it has no idea where the access token has come from, it reads the latest value from the Context variable file. As ALL Context variables will be returned from this file, a tMap component is used to filter the return values.

The tFileInputDelimited component points to the Context variable file using the context_file Context variable. It also has the schema that can be seen in the tMap "row14" table. This needs to be configured.

 

The tMap component can be seen below....

GoogleOAuth15.png

The filter that is used in the "access_token_return" table can be seen below...

row14.key.compareToIgnoreCase("access_token")==0

 

Remember that "row14" might be named differently in a version you write. If you have errors here, check the input row name.

 

20) "Test List Files Services" (tRESTClient) 

This component is to simply test the access_token that is said to exist. If it tests successfully, the Job will end. If it fails, the error trigger will be used and the Job will attempt to generate a new one.

 

To configure this component copy the configuration shown below. Ensure that the sections circled in red are set correctly. To add the "Query parameters" use the green plus symbol circled in red.

GoogleOAuth16.png

The values required can be seen above, but you can find them below so that you can copy and paste them.....

URL: "https://www.googleapis.com/drive/v2/files"

NameValue
"corpus""DEFAULT"
"q""modifiedDate < '2000-01-01T00:00:00'"

 

This HTTP request is described by Google here. I have used a query to search for files with a modified date less than 2000/01/01. This has been done so that a successful response will return no data.

 

In order for the HTTP request to work, we need to provide the access token. This is done via the "Advanced Settings" tab as can be seen below....

GoogleOAuth17.png

The access token is provided by the HTTP header "Authorization". Its value must be a combination of the word "Bearer " (with a space) and the access token that has been supplied. 

 

The Context Variable File

Below we can see an example of what the Context variable file will need to look like when it is first run. The variables that are assigned values here must be assigned values in your version. When the Job has been run for the first time, all of the values will be populated.

client_secret;YIMgcQ24ghjt65GHy8wTtiSpn8
refresh_token;
redirect_uri;http%3A%2F%2Flocalhost
scope;https://www.googleapis.com/auth/drive
context_file;C:/Talend/OpenSource/5.5.1/Studio/workspace/contextGoogle.csv
client_id;689878354248.apps.googleusercontent.com
access_token;

 

Notice that the "context_file" variable has been set. It MUST point to its own location.

The "redirect_uri" variable is "http://localhost" where the value has been URL encoded. This could be done inside the Job if you prefer to leave this as a natural value.

The "scope" variable is described by Google here.

 

Running the Job for the first time

This Job can be run on its own to demonstrate that it works. It will print the access token to the System.out. It can also be used as a child Job that returns a key/value pair holding the access_token to be used by the parent. This section will demonstrate the Job being run as a standalone Job.

 

1) Running the Job

When running for the first time, we need to make sure that the Context variable file is fully configured minus values for the refresh_token and access_token (as seen above). Once that is sorted, load the Job and click on the "Run" button (circled in red).

GoogleOAuth18.png

This will produce a string in the System.out. Copy this string (circled in red) and paste it into a web browser.

 

2) Authorise the Talend Job with Google

When the Google authentication page loads, click on the "Accept" button. As below....

GoogleOAuth19.png

3) Copy the Authorisation Redirect URL

If the authorisation has worked, a redirect URL (like below) will be returned. Copy it.

GoogleOAuth20.png

 

4) Pass the Redirect URL back to the Talend Job

Paste the value copied from the web browser address bar into the message box and click "OK". The Job will then continue.

GoogleOAuth21.png

 

5) The Access Token is Generated

As can be seen below, the Access Token will be displayed at the botton of the System.out (circled in red). It will also be added to the Context variable file along with the Refresh Token.

 GoogleOAuth22.png

 

Refreshing the Access Token from the Refresh Token

After the Refresh Token and Access Token have been generated for the first time, there should be no need for future human interaction unless the Refresh Token has been lost. To show this, open the Context variable file and add a few random characters to the Access_Token variable. Then run the Job as above. You will notice that there is no user interaction required and that a new Access_Token is generated.

 

Resetting the Refresh Token

You may find that for whatever reason the Refresh Token is not working or has been lost. If this is the case then the Talend Job will need it's authentication revoked before the Job can be run again from scratch. This is an unusual situation, but needs to be covered. To emulate this, open the Context variable file and alter some of the characters of the Access_Token and Refresh_Token. Then run the Job. You will see a screen like below informing you to revoke the access and giving you a URL to use....

GoogleOAuth23.png

Open the URL in a web browser and revoke access to your Talend Job (using the name you specified when you created the Google Project). Then start from scratch. 

 

Running the Job as a child Job

This Job can be run as any child Job in Talend. Ensure that you remember to configure a schema for the child Job that returns exactly what is output by the tBufferOutput component.

 

 

Six Stars

Re: Using OAuth 2.0 with Talend to Access Google APIs

Hi,

 

I followed the different steps, but I receive the following message : 

https://accounts.google.com/o/oauth2/auth?scope=https://www.googleapis.com/auth/drive&state=12345678...
s://accounts.google.com/o/oauth2/auth?scope=https://www.googleapis.com/auth/drive
400|{
"error": "invalid_grant",
"error_description": "Malformed auth code."
}

 

Any idea on what I can be doing wrong??

Community Manager

Re: Using OAuth 2.0 with Talend to Access Google APIs

I can see that your redirect_uri is incorrect. I used "http://localhost" (or "http%3A%2F%2Flocalhost" with the address url encoded)

 

https://accounts.google.com/o/oauth2/auth?scope=https://www.googleapis.com/auth/drive&state=123456789qwertyui&redirect_uri=https://developers.google.com/oauthplayground&response_type=code&client_id=XXXX-t701hme370kfagcvo1052id7e8jblo8b.apps.googleusercontent.com&approval_prompt=auto&include_granted_scopes=true&access_type=offline
Six Stars

Re: Using OAuth 2.0 with Talend to Access Google APIs

Hi,

 

I've corrected this.

I now have the following url :

https://accounts.google.com/o/oauth2/auth?scope=https://www.googleapis.com/auth/drive&state=12345678...

 

When I do this and put the correct access-token it works.

 

But unfortunateley after an hour this doesn't work anymore. I've also included the refresh token, but this doesn't seem to work.

I presume this is something that needs to be set up on the google account??

Community Manager

Re: Using OAuth 2.0 with Talend to Access Google APIs

It has been a while since I wrote this job (I believe I did it in 2014), so the flow may have changed slightly. But the point of this job is that once the first token is received (access token), the refresh token is saved into a CSV file. The next time the job fails and requires a new access token, the refresh token will automatically be used to create one.

 

Using the refresh token to retrieve a new access token has a slightly different flow from the original flow to generate both a refresh token and access token. This may be what is tripping you up. However, if you have received a refresh token with your first access token, you should be able to use it to  generate a new access token. 

 

Have you tried downloading the job?

Six Stars

Re: Using OAuth 2.0 with Talend to Access Google APIs

I was using the job. Probably I have forgotten this step.
Schermafbeelding 2019-02-19 om 11.54.49.png
Four Stars

Re: Using OAuth 2.0 with Talend to Access Google APIs

Not able to view/download the image

Four Stars

Re: Using OAuth 2.0 with Talend to Access Google APIs

Hi,

Just tried your scenario it's great and well documented

 

I made some changes to retrieve the code

 

in the context file

replace : redirect_uri;http%3A%2F%2Flocalhost
with : redirect_uri;urn:ietf:wg:oauth:2.0:oob

 

 

In the steps
13.

replace : redirect URL
with : code

14.
comment/remove the substring part
15.
in query parameters

replace : http://localhost
with : context.redirect_uri

 

 

Result :
In the browser you can directly copy-paste the code
It's less disturbing than the error page

 

Now I need to figure out how to retrieve data from Google with TOS and use it

Community Manager

Re: Using OAuth 2.0 with Talend to Access Google APIs

Nice additions. This was written a long time ago and things have moved on a bit with Google OAuth 2.0 functionality. I'm glad this still works and that you have been able to improve upon it @powerchip 

Four Stars

Re: Using OAuth 2.0 with Talend to Access Google APIs

I juste found this in the palette tGoogleDriveConnection

Maybe this can be easier

It's a Drive connection but I think it will work with other apis from Google, I'll need time to test it

Six Stars

Re: Using OAuth 2.0 with Talend to Access Google APIs

Hi @rhall_2_0 

Thanks for this helpful tutorial,

I have a probelem that when i get the bearer from Trestclient, i want to enter the bearer and send a json file(content informations) to the api but i don't know how?

you will find attached a screen of my job.

Thanks in advance.

 

 

Community Manager

Re: Using OAuth 2.0 with Talend to Access Google APIs

Hi @shima,

 

I am not sure that this is directly related to this tutorial. Could I ask you to post a new question and give a bit more information as to what you have done and what you wish to achieve?

 

Regards

 

Richard

Five Stars

Re: Using OAuth 2.0 with Talend to Access Google APIs

Actually, we are trying to make a post (json) api call using Oauth2.0 authentication. Please find below step, what we have done with curl command. Exactly, same thing needs to be implemented in Talend.

Step1:-
Passed curl command to fetch access token below:-

curl -v -H "Content-Type: application/x-www-form-urlencoded" -X POST --data "client_id=8BYA26jcZ4cYcivjX0oUABvefmmeqErN&client_secret=lWRoPehoFCmypPls&grant_type=client_credentials&scope=DHub" https://slot1.org009.t-dev.corp.nutty.com/v2/oauth/token --cert /usr/local/certs/nutty.com.crt --key /usr/local/certs/nutty.com.key -v –k

Step2:-
From first step, we will be getting access token:-

{"access_token":"k420G8LGaNQSUXNhih0T3xgAIwyE","token_type":"Bearer","expires_in":"3599"}

Step3:-
Using this access token in curl command we have to make another POST API call:-

curl -v -X POST -H "Content-Type: application/json" -H "Authorization: Bearer k420G8LGaNQSUXNhih0T3xgAIwyE" -H "Correlation-Id: b8905354-a164-b480-9fb2-75c2d1a8498x" -H "Source-System: Dhub" --data @usage2.json https://slot1.org009.t-dev.corp.nutty.com/application/b2b-bds-dev/v1.0/billing-events/700001276364/s... --key /usr/local/certs/nutty.com.key --cert /usr/local/certs/nutty.com.crt -v –k

Please see a snapshot of data, what we are trying to post through usage2.json file:-

{
"instanceId": "b4ee9fc9-7f50-49f4-a90f-32b0ffcf3c73",
"units": 120,
"Source-System": "D-Hub",
"effectiveDate": "2019-07-05T06:00:16.000Z",
"billingSpecId": "GSS_01",
"eventType": "GMM"
}

Final Output, which we expect:-

{"code":201,"status":201,"message":"Created"}

This is what we are trying to achieve in Talend. Is it possible to first authenticate using Oauth2.0 authenticationa nd then pass the access token to make a POST api call.

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Why Companies Move to the Cloud: 7 Success Stories

Learn how and why companies are moving to the Cloud

Read Now

Agile Data lakes & Analytics

Accelerate your data lake projects with an agile approach

Watch

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.

Download