Binary File as Input

One Star

Binary File as Input

I am a newbie to TOS and Java, but am learning quickly. I have been trying to figure out how to use a binary file as an input and then parse out the different fields in the file. I have found a couple posts about binary files, but none of them answered my question. I don't see any file input components that would work for a binary file. I looked at the tFileInputFullRow, but there isn't a distinguishable row delimiter, so that was out. I was given a Java program that parses the file, but not sure how to incorporate that into an input. Is there a way that I could modify this program to use it is an input source that pulls in the binary file or do I need to convert the file outside of Talend into a "temporary" file and then pull that file in using Talend? Any thoughts would be greatly appreciated. Thank you.
One Star

Re: Binary File as Input

Hi
Welcome to Talend Community!
I'm afraid there isn't any component to extract fields from binary file directly now.
For your request, the resolution may be a little complex.
First step, create routines and hard code in it.
The Java code below is just an example.
Maybe you have to change some logic.
1010111 1101000 1100001 1110100 100000 1100001 
100000 1101110 1101001 1100011 1100101 100000 1100100
1100001 1111001 100001
//TESTED BINARY DATA

public static String toStr(String binStr){
String[] tempStr = StrToStrArray(binStr);
char[] tempChar = new char;
for(int i = 0; i < tempStr.length; i++){
tempChar = toChar(tempStr);
}
return String.valueOf(tempChar);
}
private static String[] StrToStrArray(String str){
return str.split(" ");
}
private static char toChar(String binStr){
int[] temp = binStrToIntArray(binStr);
int sum = 0;

for(int i = 0; i < temp.length; i++){
sum += temp << i;
}

return (char)sum;

}
private static int[] binStrToIntArray(String binStr){

char[] temp = binStr.toCharArray();
int[] result = new int;

for(int i = 0; i < temp.length; i++){
result = temp - 48;
}
return result;

}

Then create a job as the following images.
If you want to extract fileds, you have to use tExtractRegexFields.
Regards,
Pedro
One Star

Re: Binary File as Input

Thank you so much Pedro! I'll give this a try and let you know if I have any questions.
One Star

Re: Binary File as Input

I've been working on this and am having a little bit of trouble. My file is a little bit different. It seems to be encrypted to an extent, so the Java code reads each individual byte in the file and converts it. The program I currently have uses the FileInputStream to load the file and then converts that to a DataInputStream. Unfortunately, there isn't a line break that is visible in the file until the byte is read and converted. I haven't been able to figure out how to convert the program to a routine though mainly because I don't know what type of input to use so I can call the routine in tMap. Here's a snippet of the code:
public class Test_ {
public static void main (String[] args) {
File file = null;
int i = 0;
int i_data = 0;
byte b_data = 0;
try {
FileInputStream file_input = new FileInputStream (file);
DataInputStream data_in = new DataInputStream (file_input);
PrintStream out = new PrintStream(args);
try {
while(true)
{
i_data = data_in.readInt ();
i_data = swapInt(i_data);
out.printf("%d;",i_data);
i_data = data_in.readInt ();
i_data = swapInt(i_data);
out.printf("%04d;",i_data);
i_data = data_in.readInt ();
i_data = swapInt(i_data);
out.printf("%04d;",i_data);
b_data = data_in.readByte ();
out.printf("%d;",b_data & 128);
out.printf("%c",data_in.readByte());
for (i=0;i<8;i++)
{
out.printf("%c",data_in.readByte());
}
out.printf("%c;",data_in.readByte());
out.printf("%c%n",data_in.readByte());
out.printf("",data_in.readByte());
} // While
}
catch (EOFException eof) {
}
data_in.close ();
out.close ();
} catch (IOException e) {
}
}
}

I did create a a couple of jobs that convert the original file to a text file and then load the text file into the database, but I assume this is slower than it would be if I could do it in one step using a routine and tMap.
I apologize if this is straightforward. I appreciate any assistance.
One Star

Re: Binary File as Input

I just wanted to give another update on this. I did some additional work and was able to make a little more progress. First of all, I changed my original process that was working to use a routine instead of a program called from a tSystem component, so it is all within TOS. I also was able to work with another routine that does something similar and able to call it within tMap. I also figured out that I could potentially use a tFileInputPositional with the specified length of the file. Unfortunately, the first 8 bytes are a header record and I'm not sure how to ignore that when the other rows are 500 bytes. I also tried to use a tFileInputFullRow and put the entire file into a single row. Not sure what the maximum length allowed is, but if I were able to do that, I can use the routine to parse the file. I'm not sure how to split the results into different rows since it would all be in the same field. I am starting to think that the only solution is to parse the file into a text file and then load that file into the database. I am open to any suggestions. Thank you.