This article shows you how to create a sample Spark Job and run it on a Microsoft Azure Databricks cluster.
Powered by Apache Spark, Databricks, is one of the first platforms to provide serverless computing. Databricks provides automated cluster management that scales according to the load.
Open Talend Studio.
In the Repository view, expand Job Designs, right-click Big Data Batch, then select Create Big Data Batch Job.
In the pop-up window, enter Databricks_Sample in the Name text box. Fill in the Purpose and Description text boxes. Click Finish.
Search for the tRowGenerator component in the Palette on the right, then drag it to the Designer.
In the Basic settings view of the tRowGenerator component, clear the Define a storage configuration component check box.
Double-click the tRowGenerator component. In the pop-up window, click the green + sign three times to add three columns.
Rename the newColumn to ID, change the Type to Integer, then select Numeric.sequence(String,int,int) from the Functions pull-down menu.
Rename the newColumn1 to FirstName, leave the Type as String, then select TalendDataGenerator.getFirstName() from the Functions pull-down menu. Similarly, rename the newColumn2 to LastName, leave the Type as String, then select TalendDataGenerator.getLastName() from the Functions drop-down menu. Click OK.
Search for the tLogRow component in the Palette on the right, then drag it to the Designer.
Right-click the tRowGenerator component, then selecting Row > Main, connect it to the tLogRow component.
In the Job, switch to the Spark Configuration tab in the Run view. Clear the Use local mode check box, then from the Distribution drop-down menu select Databricks.
Configure the Endpoint, Cluster ID, and Token using your Microsoft Azure Databricks cluster registration settings.
Select the Basic Run tab. Click Run.
After successful completion of the Job review the output in the Spark Driver Logs in the Azure Databricks portal.
This article showed you how to build a sample Spark Job in Talend Studio and how to run it on the Spark engine of the Azure Databricks cluster.