From Thursday, July, 9, 3:00 PM Pacific,
our Community site will be in
read-only mode
through Sunday, July 12th.
Thank you for your patience.

Loading data from SAP ECC to Apache Hadoop


SAP is a popular ERP system that allows thousands of companies to store transaction data and Master Data in their SAP systems. Companies investing in Big Data and Apache Hadoop technologies want to be able to extract data from their legacy systems, such as SAP, and load it into Hadoop to provide transformed or raw data to their analytics teams; allowing them to draw insights from the data.





  • Talend Studio 6.2.1
  • SAP ECC 6.0 EhP6
  • Hortonworks 2.6.4


1. Set up Kerberos and get a ticket:

  • Install the Kerberos client from the MIT site
  • Update your security policies
  • Configure the krb5.ini file
  • Add big data nodes to the hosts file on the local system
  • Get a ticket, as shown in the following image:


 2. Configure SAP connectivity:

  • Install the Talend function module on your SAP system
  • Install the sapjco jar files from SAP on the Studio computer
  • Create SAP connection metadata


  •  In the SAP Connection window, click Check to make sure the connectivity works. If succesful, the following image is shown.


  •  Log in to the SAP ECC and make sure the MARA table has data. To do this, use the transaction SE16.



3. Config Hadoop connectivity: 

  • In Talend Studio, log in to your project and select the Metadata menu.
  • Right-click your Hadoop Cluster and click Create Hadoop Cluster.
  • Select the distribution and version of your Hadoop cluster and select one of the options to load the configuration.
  • Click Next.


  • Enter the Ambari information and click Next.
  • The system retrieves your cluster information and populates the remaining data. 


  •  In the Hadoop Cluster Connection window, make sure the services are running by clicking Check Services.




Build job

  1. Open Talend Studio, log in to your project and navigate to the Metadata menu.
  2. Right-click on the SAP connection and select Retrieve SAP table.
  3. Enter the SAP table name that you want to extract data from and click Search.


  1. Select the SAP table and click Next to review the schema. Then click Finish.
  1.  The table MARA will appear in the list of SAP Tables.
  1. Right-click on Job Designs and click Create a standard Job.
  2. Give your job a name. 
  3. Drag the MARA table into the canvas and the Studio will automatically create a tSAPTableInput component with the label MARA.
  4. In the component tab, enter the filter condition if needed.
  5. Drag a tHDFSOutput component from the palette to the canvas.
  6. Connect the two components using the row1(Main).
  7. Select the subjob in the canvas and click Basic settings in the Component tab.
  8. Select Show subjob title and enter a title.
  1. Drag the tHDFSconnection component from the palette and connect it to the subjob using an OnSubjobOk link.
  2. Select the tHDFSconnection component and, in the Component tab, change the Property Type to Repository.
  3. Select the HDFS connection you created.
  4. Click the tHDFSOutput component. In the Component tab, select Use Existing connection and select the connection from the drop-down menu.
  5. Enter a filename for the hdfs output file.



  1. Drag a tSAPconnection component  from the palette and connect it to the tHDFSConnection component using an OnSubjobOk link.
  2. Click the tSAPconnection component and select the SAP connection from the repository.




Run Job

  1. From Advanced Settings on the Run tab, set the minimum and maximum memory settings.
  2. Click Run.



The following message will display in the Hadoop environment. 












Version history
Revision #:
10 of 10
Last update:
‎04-13-2019 12:34 PM
Updated by:
Four Stars

awesome article. What if your target was azure HDInsights (and Big Data Edition 6.3)? what options are in the cluster drop down ?