cancel
Showing results for 
Search instead for 
Did you mean: 

Architecture, Best Practices, and How To's

Includes a troubleshooting guide.
View full article
Understand the custom libraries process.
View full article
Overview SAP is a popular ERP system that allows thousands of companies to store transaction data and Master Data in their SAP systems. Companies investing in Big Data and Apache Hadoop technologies want to be able to extract data from their legacy systems, such as SAP, and load it into Hadoop to provide transformed or raw data to their analytics teams; allowing them to draw insights from the data.       Environment Talend Studio 6.2.1 SAP ECC 6.0 EhP6 Hortonworks 2.6.4 Prerequisites  1. Set up Kerberos and get a ticket: Install the Kerberos client from the MIT site Update your security policies Configure the krb5.ini file Add big data nodes to the hosts file on the local system Get a ticket, as shown in the following image:  2. Configure SAP connectivity: Install the Talend function module on your SAP system Install the sapjco jar files from SAP on the Studio computer Create SAP connection metadata  In the SAP Connection window, click Check to make sure the connectivity works. If  succesful , the following image is shown.  Log in to the SAP ECC and make sure the MARA table has data. To do this, use the transaction SE16.   3. Config Hadoop connectivity:  In Talend Studio, log in to your project and select the Metadata menu. Right-click your Hadoop Cluster and click Create Hadoop Cluster. Select the distribution and version of your Hadoop cluster and select one of the options to load the configuration. Click Next. Enter the Ambari information and click Next. The system retrieves your cluster information and populates the remaining data.   In the Hadoop Cluster Connection window, make sure the services are running by clicking Check Services.     Build job Open Talend Studio, log in to your project and navigate to the Metadata menu. Right-click on the SAP connection and select Retrieve SAP table. Enter the SAP table name that you want to extract data from and click Search. Select the SAP table and click Next to review the schema. Then click Finish.  The table MARA will appear in the list of SAP Tables. Right-click on Job Designs and click Create a standard Job. Give your job a name.  Drag the MARA table into the canvas and the Studio will automatically create a tSAPTableInput component with the label MARA. In the component tab, enter the filter condition if needed. Drag a tHDFSOutput component from the palette to the canvas. Connect the two components using the row1(Main). Select the subjob in the canvas and click Basic settings in the Component tab. Select Show subjob title and enter a title. Drag the tHDFSconnection component from the palette and connect it to the subjob using an OnSubjobOk link. Select the tHDFSconnection component and, in the Component tab, change the Property Type to Repository. Select the HDFS connection you created. Click the tHDFSOutput component. In the Component tab, select Use Existing connection and select the connection from the drop-down menu. Enter a filename for the hdfs output file.     Drag a tSAPconnection component  from the palette and connect it to the tHDFSConnection component using an OnSubjobOk link. Click the tSAPconnection component and select the SAP connection from the repository.     Run Job From Advanced Settings on the Run tab, set the minimum and maximum memory settings. Click Run.   The following message will display in the Hadoop environment.                     
View full article
Top Contributors