Creating a CRC32C-enabled custom component for Talend Studio based on out-of-the-box components

Disclaimer

The method of creating a custom component described in this document has been deprecated in favor of the new Talend Component Kit.

 

Overview

Talend Professional Services does not recommend extending out-of-the-box Studio components with new features. While it is possible, doing so can break, or make difficult, the upgrade path to a new Talend release. It is better to use an existing component as a base, and create a new custom component where you can add the extended functions.

 

Follow the steps in this document to create a CRC32C-enabled custom component in Talend Studio. The new component is based on the out-of-the-box tAddCRCRow Data Quality component. By default, Talend’s tAddCRCRow supports several checksum methods including CRC32.

 

CRC32C (a.k.a Castagnoli) is a newer form of the CRC32 checksum algorithm. CRC32C uses a different polynomial than CRC32, but otherwise is the same as CRC32. CRC32C is appearing in newer Intel CPUs and has gained a reputation for increased speed of computation.

 

Enable CRC32C in Talend Studio

 

Create a CRC32C Java Class

  1. Find, or create, a Java class that implements the CRC32C checksum. You can find the Google version of CRC32C after a short search of the internet. For convenience, you can add a method to the class to provide a single call to return the checksum for a string:

    /**
    * tdye -- Added compute method
    *
    */
    public long compute (String s) {
    	this.update(s.getBytes(),0,s.length());
    	return this.getValue();
    }

    See the Appendix for a complete Java code listing of the Crc32c class.

  2. As this Java class is packaged for com.google.cloud, you should place it in a directory structure that follows this hierarchy:

    hierarchy.png

  3. Compile this however you wish, for example from a Windows command prompt as shown below. With Talend Studio, you have a Java Development Kit installed, so it is a simple matter to run the compiler:

    C:\> "c:\Program Files\Java\jdk1.7.0_67"\bin\javac Crc32c.java

    compile.png

  4. From the root of the Java class, create a JAR file.

    C:\> "c:\Program Files\Java\jdk1.7.0_67"\bin\jar -tf Crc32c.jar

    createjar.png

     

    You can set the Java JAR file aside for now.

 

Copy the out-of-the-box component

  1. Create a directory on your Talend Studio workstation where you will create and deploy the custom component. For example, the path could be:

    C:/data/Development/Custom_Components
  2. Copy the tAddCRCRow component into your Custom_Components directory. It can probably be found here, though your actual path may vary:

    Studio Install Directory\plugins\org.talend.designer.components.localprovider_version\components

    tAddCRCRow.png

  3. Rename the newly copied components directory to a new name. In this example, the new custom component is called tCustomAddCRCRow. Rename all files in this directory to reflect the name of the new custom component.

    rename.png

 

Edit the XML file

Edit the XML file for the new component (tCustomAddCRCRow_java.xml). For convenience, the entire file is reproduced in the Appendix. Update the following sections:

  • HEADER
  • FAMILIES
  • ADVANCED_PARAMETERS
  • CODEGENERATION
  1. At minimum, change the author to a new value:

    <HEADER PLATEFORM="ALL" SERIAL="" VERSION="0.101" STATUS="ALPHA"
    		COMPATIBILITY="ALL" AUTHOR="tdye" RELEASE_DATE="20050320A"
    		STARTABLE="false" DATA_AUTO_PROPAGATE="false" PARTITIONING="AUTO">
    		<SIGNATURE></SIGNATURE>
    </HEADER>
  2. Change the component family to a new value:

    <FAMILIES>
        <FAMILY>TDye Components</FAMILY>
    </FAMILIES>
  3. Add a new parameter for the CRC32C function:

    <ADVANCED_PARAMETERS>
    	<PARAMETER NAME="CRC_TYPE" FIELD="CLOSED_LIST" NUM_ROW="30">
    		<ITEMS DEFAULT="crc32">
    			<ITEM NAME="CRC8" VALUE="crc8" />
    			<ITEM NAME="CRC16" VALUE="crc16" />
    			<ITEM NAME="CRC32" VALUE="crc32" />
    			<ITEM NAME="CRC32C" VALUE="crc32c" />
    		</ITEMS>
    	</PARAMETER>
    </ADVANCED_PARAMETERS>
  4. Add an entry for the new CRC32C jar file:

    <CODEGENERATION>
    <IMPORTS>
    	<IMPORT NAME="Java-CRC" MODULE="OneWireAPI.jar" MVN="mvn:org.talend.libraries/OneWireAPI/6.0.0" 
    				REQUIRED="true" />
    	<IMPORT NAME="Crc32c" MODULE="Crc32c.jar" MVN="mvn:org.talend.libraries/Crc32c/6.0.0" 
    				REQUIRED="true" />
    </IMPORTS>
    </CODEGENERATION>

 

Edit the JavaJet Main file

Edit the tCustomAddCRCRow_main.javajet file to add Java Jet code for the new CRC32C option. This will instantiate a new Crc32c class, and call the new compute method.

if(("crc32c").equals(crcType)){
%>          
    com.google.cloud.Crc32c crc32c<%=cid%> = new com.google.cloud.Crc32c();  
    crcComputedValue<%=cid %> = new Long(crc32c<%=cid%>.compute(strBuffer_<%=cid%>.toString()));
<%          
}

See the Appendix for a complete file listing.

 

Enable custom components in Talend Studio

  1. To enable Studio to find your custom component, add the Custom_Components directory created earlier to the Window > Preferences > Talend > Components dialog:

    dialog.png

  2. Your custom component should show up in the pallet under the new family name you set above. If not, restart Studio.

 

Test CRC32C

  1. Create a small Job to test the new custom component.

    job.png

     

    Because the new Crc32c.jar file has not yet been installed, the component icon will be decorated with a red exclamation point icon, and when configuring the new component, you will get an orange banner stating that you are missing an external JAR.

    missingjar.png

  2. To install the JAR file, click Install, then click the jar icon to get the Open File dialog.

    jar.png

  3. Navigate to the directory where Crc32c.jar is located, then select it. Click Open and the JAR will be uploaded to Studio and (if configured) Nexus.

    openjar.png

  4. If you haven’t already done so, select the columns for use in the CRC checksum.

  5. After the JAR file is loaded, continue to configure the component. On the Advanced Settings tab, select CRC32C:

    crctype.png

  6. Run the Job. If you did not previously load the JAR, you will be prompted to install the missing Crc32c.jar file. Click Download and install, then follow the prompts to find the JAR and load it into Studio.

    loadjar.png

  7. When the Job runs, the custom component will provide a checksum for the columns selected.

    run.png

 

Share the custom component

  1. To share the custom component with other developers, select the File > Edit Project Properties > Custom Component dialog in Studio.

  2. Move the component you want to share to the Shared Components pane.

  3. Select Apply or OK to complete the action.

    share.png

 

Appendix

 

Crc32c.java

package com.google.cloud;

import java.util.zip.Checksum;

/**
* This class generates a CRC32C checksum, defined by rfc3720 section B.4.
*
*
*/
public final class Crc32c implements Checksum {

private static final long[] CRC_TABLE = {
   0x00000000, 0xf26b8303, 0xe13b70f7, 0x1350f3f4,
   0xc79a971f, 0x35f1141c, 0x26a1e7e8, 0xd4ca64eb,
   0x8ad958cf, 0x78b2dbcc, 0x6be22838, 0x9989ab3b,
   0x4d43cfd0, 0xbf284cd3, 0xac78bf27, 0x5e133c24,
   0x105ec76f, 0xe235446c, 0xf165b798, 0x030e349b,
   0xd7c45070, 0x25afd373, 0x36ff2087, 0xc494a384,
   0x9a879fa0, 0x68ec1ca3, 0x7bbcef57, 0x89d76c54,
   0x5d1d08bf, 0xaf768bbc, 0xbc267848, 0x4e4dfb4b,
   0x20bd8ede, 0xd2d60ddd, 0xc186fe29, 0x33ed7d2a,
   0xe72719c1, 0x154c9ac2, 0x061c6936, 0xf477ea35,
   0xaa64d611, 0x580f5512, 0x4b5fa6e6, 0xb93425e5,
   0x6dfe410e, 0x9f95c20d, 0x8cc531f9, 0x7eaeb2fa,
   0x30e349b1, 0xc288cab2, 0xd1d83946, 0x23b3ba45,
   0xf779deae, 0x05125dad, 0x1642ae59, 0xe4292d5a,
   0xba3a117e, 0x4851927d, 0x5b016189, 0xa96ae28a,
   0x7da08661, 0x8fcb0562, 0x9c9bf696, 0x6ef07595,
   0x417b1dbc, 0xb3109ebf, 0xa0406d4b, 0x522bee48,
   0x86e18aa3, 0x748a09a0, 0x67dafa54, 0x95b17957,
   0xcba24573, 0x39c9c670, 0x2a993584, 0xd8f2b687,
   0x0c38d26c, 0xfe53516f, 0xed03a29b, 0x1f682198,
   0x5125dad3, 0xa34e59d0, 0xb01eaa24, 0x42752927,
   0x96bf4dcc, 0x64d4cecf, 0x77843d3b, 0x85efbe38,
   0xdbfc821c, 0x2997011f, 0x3ac7f2eb, 0xc8ac71e8,
   0x1c661503, 0xee0d9600, 0xfd5d65f4, 0x0f36e6f7,
   0x61c69362, 0x93ad1061, 0x80fde395, 0x72966096,
   0xa65c047d, 0x5437877e, 0x4767748a, 0xb50cf789,
   0xeb1fcbad, 0x197448ae, 0x0a24bb5a, 0xf84f3859,
   0x2c855cb2, 0xdeeedfb1, 0xcdbe2c45, 0x3fd5af46,
   0x7198540d, 0x83f3d70e, 0x90a324fa, 0x62c8a7f9,
   0xb602c312, 0x44694011, 0x5739b3e5, 0xa55230e6,
   0xfb410cc2, 0x092a8fc1, 0x1a7a7c35, 0xe811ff36,
   0x3cdb9bdd, 0xceb018de, 0xdde0eb2a, 0x2f8b6829,
   0x82f63b78, 0x709db87b, 0x63cd4b8f, 0x91a6c88c,
   0x456cac67, 0xb7072f64, 0xa457dc90, 0x563c5f93,
   0x082f63b7, 0xfa44e0b4, 0xe9141340, 0x1b7f9043,
   0xcfb5f4a8, 0x3dde77ab, 0x2e8e845f, 0xdce5075c,
   0x92a8fc17, 0x60c37f14, 0x73938ce0, 0x81f80fe3,
   0x55326b08, 0xa759e80b, 0xb4091bff, 0x466298fc,
   0x1871a4d8, 0xea1a27db, 0xf94ad42f, 0x0b21572c,
   0xdfeb33c7, 0x2d80b0c4, 0x3ed04330, 0xccbbc033,
   0xa24bb5a6, 0x502036a5, 0x4370c551, 0xb11b4652,
   0x65d122b9, 0x97baa1ba, 0x84ea524e, 0x7681d14d,
   0x2892ed69, 0xdaf96e6a, 0xc9a99d9e, 0x3bc21e9d,
   0xef087a76, 0x1d63f975, 0x0e330a81, 0xfc588982,
   0xb21572c9, 0x407ef1ca, 0x532e023e, 0xa145813d,
   0x758fe5d6, 0x87e466d5, 0x94b49521, 0x66df1622,
   0x38cc2a06, 0xcaa7a905, 0xd9f75af1, 0x2b9cd9f2,
   0xff56bd19, 0x0d3d3e1a, 0x1e6dcdee, 0xec064eed,
   0xc38d26c4, 0x31e6a5c7, 0x22b65633, 0xd0ddd530,
   0x0417b1db, 0xf67c32d8, 0xe52cc12c, 0x1747422f,
   0x49547e0b, 0xbb3ffd08, 0xa86f0efc, 0x5a048dff,
   0x8ecee914, 0x7ca56a17, 0x6ff599e3, 0x9d9e1ae0,
   0xd3d3e1ab, 0x21b862a8, 0x32e8915c, 0xc083125f,
   0x144976b4, 0xe622f5b7, 0xf5720643, 0x07198540,
   0x590ab964, 0xab613a67, 0xb831c993, 0x4a5a4a90,
   0x9e902e7b, 0x6cfbad78, 0x7fab5e8c, 0x8dc0dd8f,
   0xe330a81a, 0x115b2b19, 0x020bd8ed, 0xf0605bee,
   0x24aa3f05, 0xd6c1bc06, 0xc5914ff2, 0x37faccf1,
   0x69e9f0d5, 0x9b8273d6, 0x88d28022, 0x7ab90321,
   0xae7367ca, 0x5c18e4c9, 0x4f48173d, 0xbd23943e,
   0xf36e6f75, 0x0105ec76, 0x12551f82, 0xe03e9c81,
   0x34f4f86a, 0xc69f7b69, 0xd5cf889d, 0x27a40b9e,
   0x79b737ba, 0x8bdcb4b9, 0x988c474d, 0x6ae7c44e,
   0xbe2da0a5, 0x4c4623a6, 0x5f16d052, 0xad7d5351
};

private static final long LONG_MASK = 0xffffffffL;
private static final long BYTE_MASK = 0xff;

private long crc;

public Crc32c() {
 crc = 0;
}

/**
* Updates the checksum with a new byte.
* @param b the new byte.
*/
@Override
public void update(int b) {
 long newCrc = crc ^ LONG_MASK;
 newCrc = updateByte((byte) b, newCrc);
 crc = newCrc ^ LONG_MASK;
}

/**
* Updates the checksum with an array of bytes.
* @param bArray the array of bytes.
* @param off the offset into the array where the update should begin.
* @param len the length of data to examine.
*/
@Override
public void update(byte[] bArray, int off, int len) {
 long newCrc = crc ^ LONG_MASK;
 for (int i = off; i < off + len; i++) {
   newCrc = updateByte(bArray[i], newCrc);
 }
 crc = newCrc ^ LONG_MASK;
}

/**
* Returns the value of the checksum.
* @return the long representation of the checksum (high bits set to zero).
*/
@Override
public long getValue() {
 return crc;
}

/**
* tdye -- Added compute method
*
*/
public long compute (String s) {
	this.update(s.getBytes(),0,s.length());
	return this.getValue();
}

/**
* Returns the value of the checksum.
* @return the 4-byte array representation of the checksum in network byte order (big endian).
*/
public byte[] getValueAsBytes() {
 long value = crc;
 byte[] result = new byte[4];
 for (int i = 3; i >= 0; i--) {
   result[i] = (byte) (value & 0xffL);
   value >>= 8;
 }
 return result;
}

/**
* Resets the crc.
*/
@Override
public void reset() {
 crc = 0;
}

private long updateByte(byte newByte, long crc) {
 byte b = (byte) (newByte & BYTE_MASK);
 int index = (int) ((crc ^ b) & BYTE_MASK);
 return (CRC_TABLE[index] ^ (crc >> 8)) & LONG_MASK;
}
}

 

tCustomAddCRCRow_java.xml

<COMPONENT>
	<HEADER PLATEFORM="ALL" SERIAL="" VERSION="0.101" STATUS="ALPHA"
		COMPATIBILITY="ALL" AUTHOR="tdye" RELEASE_DATE="20050320A"
		STARTABLE="false" DATA_AUTO_PROPAGATE="false" PARTITIONING="AUTO">
		<SIGNATURE></SIGNATURE>
	</HEADER>

	<FAMILIES>
		<FAMILY>TDye Components</FAMILY>
	</FAMILIES>

	<DOCUMENTATION>
		<URL />
	</DOCUMENTATION>

	<CONNECTORS>
		<CONNECTOR CTYPE="FLOW" MAX_INPUT="1" MAX_OUTPUT="1"/>
		<CONNECTOR CTYPE="ITERATE" MAX_OUTPUT="0" MAX_INPUT="0" />
		<CONNECTOR CTYPE="SUBJOB_OK" MAX_INPUT="1" />
		<CONNECTOR CTYPE="SUBJOB_ERROR" MAX_INPUT="1" />
		<CONNECTOR CTYPE="COMPONENT_OK" />
		<CONNECTOR CTYPE="COMPONENT_ERROR" />
		<CONNECTOR CTYPE="RUN_IF" />
	</CONNECTORS>

	<PARAMETERS>

		<PARAMETER NAME="SCHEMA" FIELD="SCHEMA_TYPE" READONLY="true" REQUIRED="true"
			NUM_ROW="10">
			<TABLE READONLY="true">
				<COLUMN NAME="CRC" TYPE="id_Long" LENGTH="255"
					READONLY="false" CUSTOM="true" />
			</TABLE>
		</PARAMETER>
		<PARAMETER NAME="IMPLICATION" FIELD="TABLE" REQUIRED="true"
			NUM_ROW="20" NB_LINES="5">
			<ITEMS BASED_ON_SCHEMA="true">
				<ITEM NAME="USE_IN_CRC" FIELD="CHECK" VALUE="false" />
			</ITEMS>
		</PARAMETER>
	</PARAMETERS>
	<ADVANCED_PARAMETERS>
		<PARAMETER NAME="CRC_TYPE" FIELD="CLOSED_LIST" NUM_ROW="30">
			<ITEMS DEFAULT="crc32">
				<ITEM NAME="CRC8" VALUE="crc8" />
				<ITEM NAME="CRC16" VALUE="crc16" />
				<ITEM NAME="CRC32" VALUE="crc32" />
				<ITEM NAME="CRC32C" VALUE="crc32c" />
			</ITEMS>
		</PARAMETER>
	</ADVANCED_PARAMETERS>

	<CODEGENERATION>
		<IMPORTS>
			<IMPORT NAME="Java-CRC" MODULE="OneWireAPI.jar" MVN="mvn:org.talend.libraries/OneWireAPI/6.0.0" 
				REQUIRED="true" />
			<IMPORT NAME="Crc32c" MODULE="Crc32c.jar" MVN="mvn:org.talend.libraries/Crc32c/6.0.0" 
				REQUIRED="true" />
		</IMPORTS>
	</CODEGENERATION>

	<RETURNS>
		<RETURN NAME="NB_LINE" TYPE="id_Integer" AVAILABILITY="AFTER" />
	</RETURNS>
</COMPONENT>

 

tCustomAddCRCRow_main.javajet

<%@ jet
imports="
    org.talend.core.model.process.INode
    org.talend.core.model.process.ElementParameterParser
    org.talend.core.model.metadata.IMetadataTable 
    org.talend.designer.codegen.config.CodeGeneratorArgument
    java.util.List
    java.util.Map
    java.util.ArrayList
    org.talend.core.model.process.IConnection
    org.talend.core.model.metadata.IMetadataColumn
    org.talend.core.model.process.IConnectionCategory
    org.talend.core.model.metadata.types.JavaType
	org.talend.core.model.metadata.types.JavaTypesManager
"
%>
<%
CodeGeneratorArgument codeGenArgument = (CodeGeneratorArgument) argument;
INode node = (INode)codeGenArgument.getArgument();
String crcType = ElementParameterParser.getValue(node,"__CRC_TYPE__");
List<IMetadataTable> metadatas = node.getMetadataList();
if ((metadatas!=null)&&(metadatas.size()>0)) {
    IMetadataTable metadata = metadatas.get(0);
    if (metadata!=null)
     {
        String cid = node.getUniqueName();
        List<Map<String, String>> implication =(List<Map<String,String>>)ElementParameterParser.getObjectValue(node,"__IMPLICATION__");
        List keyCols = new ArrayList();
        for (int i=0; i<implication.size(); i++) {
            Map<String, String> col = implication.get(i);
            if (("true").equals(col.get("USE_IN_CRC"))) {
                keyCols.add(i);
            }
        }
       List< ? extends IConnection> conns = node.getIncomingConnections();
       %>
       Long crcComputedValue<%=cid %> = null;
       <%
       if(conns!=null){
           if (conns.size()>0){
       
                IConnection conn =conns.get(0);
                List<IMetadataColumn> columns = metadata.getListColumns();
                %>
                StringBuilder strBuffer_<%=cid%> = new StringBuilder();
                <%
                for (int i = 0; i < columns.size()-1; i++) {
                    IMetadataColumn column = columns.get(i);
                    JavaType javaType = JavaTypesManager.getJavaTypeFromId(column.getTalendType());
                    if(keyCols.contains(i))
                    {
                %>
                       strBuffer_<%=cid%>.append(
				<%
    				if (javaType == JavaTypesManager.BIGDECIMAL) {
				%>
								(<%=conn.getName() %>.<%=column.getLabel() %>==null)?null:(<%=conn.getName() %>.<%=column.getLabel() %>.stripTrailingZeros())
				<%
					}  else {
				%>
				                String.valueOf(<%=conn.getName() %>.<%=column.getLabel() %>)			
				<%				
					}
				%>				
				);
                		
                        <%  
                    }
         
                }
				if(("crc32c").equals(crcType)){
                    %>
                    
                    com.google.cloud.Crc32c crc32c<%=cid%> = new com.google.cloud.Crc32c();  
                    crcComputedValue<%=cid %> = new Long(crc32c<%=cid%>.compute(strBuffer_<%=cid%>.toString()));
                    <%          
                }   
                if(("crc32").equals(crcType)){
                    %>
                    
                    java.util.zip.CRC32 crc32<%=cid%> = new java.util.zip.CRC32();  
                    crc32<%=cid%>.update(strBuffer_<%=cid%>.toString().getBytes());
                    crcComputedValue<%=cid %> = new Long(crc32<%=cid%>.getValue());
                    <%          
                }   
                if(("crc8").equals(crcType)){
          
                    %>
        
                    crcComputedValue<%=cid %> = new Long(com.dalsemi.onewire.utils.CRC8.compute(strBuffer_<%=cid%>.toString().getBytes())); 
                    <%       
                }   
                if(("crc16").equals(crcType)){
          
                    %>      
                    crcComputedValue<%=cid %> = new Long(com.dalsemi.onewire.utils.CRC16.compute(strBuffer_<%=cid%>.toString().getBytes()));        
                    <%      
                }
				
                List< ? extends IConnection> connsout = node.getOutgoingConnections(); 
                if (connsout!=null) {
                    List<IMetadataColumn> columnsout = metadata.getListColumns();
                    for(int i=0;i<connsout.size();i++) {
                        IConnection connout = connsout.get(i);
                        if(connout.getLineStyle().hasConnectionCategory(IConnectionCategory.DATA))
                        {
                            int columnSize=columnsout.size()-1;
                            for (int j = 0; j < columnSize; j++) {
                                IMetadataColumn columnout=columnsout.get(j);
                   
                                %>
                                <%=connout.getName() %>.<%=columnout.getLabel() %>=<%=conn.getName() %>.<%=columnout.getLabel() %>;
                                <%            
                            }  
                            IMetadataColumn columnout=columnsout.get(columnSize); 
                            %>
                            <%=connout.getName() %>.<%=columnout.getLabel() %>=crcComputedValue<%=cid %>;
                            <%                  
                        }
                    }
                }
                %>
                nb_line_<%=cid %>++;
                <%
           }   
       } 
                
    }
}
%>
Version history
Revision #:
6 of 6
Last update:
‎07-06-2018 04:00 PM
Updated by:
 
Labels (1)
Contributors