The Twitter components from this “How to” can be found here.
Many insights can be found from looking at data that can be gathered from Twitter. Because of this, it can be a huge competitive advantage to have the ability to input data from twitter into your Talend jobs.
First and foremost, you will need to find out four main variables to get these components to work. You will need to know your Consumer Key, Consumer Secret, Access Token, and Access Token Secret linked to your Twitter account that you will be using. This information can be accessed by going to the Twitter Application site and logging into your Twitter account.
Once you log in, click on “Create New App.” Fill out the necessary information, ensuring that your application Name is unique. Once you create the application you can go to the “Keys and Access Tokens” tab where you can look at your consumer keys and create your access tokens.
Going back into Talend studio, create a new job. The main components you will be using for this simple job are tTwitterOAuth, tTwitterInput, and tTwitterOAuthClose. This simple job will take information from twitter (the tweet id, the text of the tweet, and the creation date of the tweet) to showcase the capabilities of the components. To configure tTwitterOAuth you must fill out your twitter information in the respective fields correctly.
Once you set everything you can start to configure tTwitterInput component. First edit the schema to include the columns we are looking to display, tweetId, Text, and CreationDate.
Once the schema is configured you can start to work on the basic settings of the component. Ensure that you have the connection we just created selected, then map the columns created in the schema to their corresponding operation, as shown below. You can also specify your search by adding an “operator.” In this scenario we want to find tweets including the term, “talend”.
The advanced settings allow us to specify a date range in which we want to pull the tweets from, as well as, setting a lower and upper tweet id limit, and the limit of the number of tweets you want to pull. Using the lower and upper tweet id limit will help avoid downloading duplicate tweets.
After the tTwitterInput is configured, add a tLogRow to output the data onto the run console, so we can see the tweets that we pulled, and then close the connection using the tTwitterOAuthClose component. The end result should look something like the picture bellow.