Split source based on byte size


You may encounter a situation where you have to split your bulky source into multiple target files with a specific byte size. Talend target file components don't come with the option to create files based on specific KB/MB sizes.



Use the tJavaFlex component with FileOutputStream to achieve this.

  1. Create a Standard Job in Talend Studio, and define the source on which the split needs to be done.

  2. Configure the Basic settings of the source with column fields. Here, file input is used as the source, and the component is renamed Read Source.



  3. Create a tJavaFlex component in the workspace, and connect the source to the tJavaFlex component with a Main row. Make sure all the fields from the source are captured with a Sync columns option, and rename the component (for better naming convention) as Split_Based_On_Size



  4. Define context variables for your Target path (TgtFilePath) where the files will be created, and split size (SplitByte) based on the size at which new target files will be created.



    Here the goal is to split into 1 MB target files, so the SplitByte value was derived with:

             1 MB = 1024 KB = 1024 * 1024 bytes = 1048576


  5. A tJavaFlex component comes with three code components: Start code, Main Code, and End Code. They help to initialize/define things in Start Code, execute the required logic/operations in Main Code, and finish it with End Code.

    1. In Start code, define two integer variables: iterator to keep the count of files generated, and ByteCount to count the number of bytes read. Define a FileOutputStream that will be used to write target files:

      // start part of your Java code
      Integer iterator = 1;
      Integer ByteCount = 0;
      FileOutputStream fos = new
    2. In Main Code, define where you can read the records from source and convert them to Bytes and get their length:

      String tmpReadLine= row1.FirstName+","+
      row1.State+"\n"; //Read input fields
      byte[] contentInBytes = tmpReadLine.getBytes();//Convert them to Byte array
      ByteCount=ByteCount+contentInBytes.length; // Summation of line bytes read
      if ( ByteCount > context.SplitByte ) {
      // Check if bytes read hasn't crossed the threshold
          ByteCount = 0;
          iterator = iterator+ 1;
          // Threshold crossed write to new file
          fos = new
      } else {
          fos.write(contentInBytes); // else write to same file

      Note: When you are initializing the tmpReadLine variable, choose the delimiter and row separator you want before writing to the file. If you're not sure, configure them through context variables and use them here.


      Using the tmpReadLine variable, read the entire line using source fields—FirstName, Lastname, Age, City and State. Change the code based on your source fields.


    3. In the End Code, use fos.close() to close the FileOutputStream connection.

      In order to use FileOutputStream, you need to import the java.io.FileOutputStream library. Add this library in the Advanced settings of the tJavaFlex component.



    4. Execute the Job, and check the Target file location as configured in the Job:




In this design, there will never be partial or broken records written to the target, as you either write the entire line or move to a new target file.

Version history
Revision #:
7 of 7
Last update:
‎09-29-2018 12:14 AM
Updated by:
Labels (3)
Four Stars

Thanks for the sharing the article, it really helped me.


I found an issue with the above code 

if ( ByteCount > context.SplitByte ) {

If ByeCount is greater than the SplitByte we are dropping the particular record. instead of writing it into a file.

    fos.write(contentInBytes); // else write to same file 

 So we just have to remove the else part so that the threshold record will be written to the new File instead of dropping.